Predictability

The topic of predictability in software engineering is one that comes up often at work. Program Increments - a tool used as part of applying agile in large teams - is arguably a tool intended to help drive some predictability in an organization.

But if one purpose of Program Increments is to bring predictability to development cycles then it can only be effective if underlying systems have some predictability. Without that, program increments can actually have a depressive effect on overall productivity, stability and morale.

I will start by saying I am no expert on this subject - in fact, everything in here is based on my own limited experience and observations I have made along the way.

System Predictability

Program Increments are cycles of development that help a team to bring some predictability to their deliverables both for the sake of stakeholders and the developers. For that to work well, three things are needed:

  1. A stable intake process,
  2. A stable platforms
  3. A stable set of teams.

Attempting to overlay stable process on an unstable foundation will only be frustrating for all concerned and will generate internal conflict further undermining the intended goal.

Intake Stabilization

Simply put, if the product work that is being asked of the team is not well defined or changes mid-stream, then it will be difficult to establish any kind of predictability in output. Establishing a good practice of building work plans is instrumental to any successful process.

Platform Stabilization

To achieve predictability, a healthy start would be to focus on the instability of the platforms. What are the things that have made it extremely difficult to give predictable estimates to stakeholders? Where are we seeing frequent breakdowns in communication or frequent regressions? Spinning up small teams to address those issues specifically would begin to heal the wounds that have been inflicted by a legacy of undocumented and unstable quick fixes.

Team Stabilization

Team stabilization doesn’t necessarily mean that you have the same team members perpetually but it does mean that you have the same team through the life of an Initiative. It’s extremely important to understand the long term viability of your product in relation to the company’s growth projections - where uncertainty exists, assume no growth. This last point is crucial.

IMPORTANT

Building a system that is capable being operated by a small team can be later be transformed into something more complex if resources allow in the future. But a complex system that is meant to be run by a large team no longer has that team is extremely difficult to simplify and scale down to a smaller team. In other words, it’s easier to go from simple to complicated than it is to go from complicated to simple.

📒 NOTE

Google can safely assume that they will have a strong engineering presence in perpetuity. For established engineering organizations, this is not a concern and they can focus instead on the best possible solution regardless of cost. Most organizations cannot do that - instead, most organizations have to focus on only the true differentiators for their business and laser focus their most valuable resources (people) on those.

To double down on small team sizes, we can achieve more predictability by reducing the size of teams overall. The time spent in repeated justification and status reporting not only reduces the time spent on productive work but also frustrates everyone involved and is sometimes used as an opportunity to posture, protect and conceal issues than to actually resolve and improve. The more people involved the higher the probability that this will happen.

A Considered Approach

The notion of Program Increments makes sense to me conceptually, but the way that teams are organized to produce optimal efficiency is crucial to their success. In every company I’ve been at, top-down operational instructions have been disastrous to team morale and outcomes. Alternatively, top-down guard-rails and expectations of output have been key to solving the problem of predictability while giving the teams autonomy to adapt processes to existing skillsets and personalities.

The following is an approach to organizing teams in a way that, I believe, will increase autonomy while providing stability and predictability to outcomes. It’s important to note that any plan needs a way to adjust for necessary change and improvements. This one is no different.

The plan introduces a few key terms that are pillars of the approach: Feature Groups, Platforms Groups, Teams, Products and Initiatives.

A feature group is a stable collection of people and processes that focus on the furthering of a “product”. A group has an engineering manager, a project/program/delivery manager and a product manager. Feature groups are focused on business initiative and are linked to customers. Their KPIs are tied to revenue, customer usage, customer feedback, etc.

A platform group is a stable collection of people and processes that focus on the furthering of a “technology”. A group has an engineering manager and a product manager. Platform groups are focused on reliability, performance, developer experience for the technology associated with it. Their KPIs are tied to uptime, developer experience, MTTR, output quality, testing coverage, etc.

A team is a small, ephemeral collection of people and process that focus on the completion of a specific task or related set of tasks. A team has a product owner, an engineering lead and 2-4 engineers - with at least one senior engineer. Teams are linked to specific initiatives. KPIs for teams are related to feature quality, planned delivery vs. actual delivery metrics, etc.

A product is a related set of features that are available to customers. The breadth of a product is determined on a case by case basis. For example, if the number of engineers available at a company is small, it will likely make sense to have a product that is tied to an entire application. But if enough engineers, product managers, etc are available then it may make sense to divide that application into smaller concerns.

An initiative is a well-defined and finite project related to the product to which the initiative is associated (as part of the feature group).

graph TD
  FeatureGroup --> Team1
	FeatureGroup --> Team2
	FeatureGroup --> Team3
	FeatureGroup --> Team4
	FeatureGroup --> Team5
	Team1 --> Initiative1
	Team2 --> Initiative2
	Team3 --> Initiative3
	Team4 --> Initiative4
	Team5 --> Initiative5
	Initiative1 -.-> PlatformGroupA[Platform Group A]
	Initiative2 -.-> PlatformGroupA
	Initiative5 -.-> PlatformGroupA	 
	Initiative1 -.-> PlatformGroupB[Platform Group B]
	Initiative4 -.-> PlatformGroupB

To summarize, a feature group is made up of a pool of engineers and product owners. The pool engineers are generalists who are focused on the product’s primary technology but have a good understanding of the associated technologies. When the feature group decides on an initiative, a team is built around that initiative. The team is made up of engineers and product owners from the pool. Additionally, if necessary, platform group engineers are “borrowed” for the life of that initiative.

An Example

An Engineering org made up of ~75 people that is responsible for the development of a suite of productivity tools like Email, Calendar, Todo, etc.

In this case, the Feature Groups are:

  • Email
    • Pool
      • 1 Product Manager
      • 1 Delivery Manager
      • 1 Engineering Manager
      • 2 Product Owners
      • 10 Engineers
  • Calendar
    • Pool
      • 1 Product Manager
      • 1 Delivery Manager
      • 1 Engineering Manager
      • 2 Product Owners
      • 10 Engineers
  • Todo
    • Pool
      • 1 Product Manager
      • 1 Delivery Manager (Project Manager)
      • 1 Engineering Manager
      • 2 Product Owners
      • 10 Engineers

The Platform Groups are:

  • Infrastructure Platform (separated out if team size allows)
    • Pool
      • 1 Product Manager
      • 1 Engineering Manager
      • 5 Engineers
  • Backend Platform (a set of services used across all applications)
    • Pool
      • 1 Product Manager
      • 1 Engineering Manager
      • 5 Engineers
  • Frontend Platform (a set of frontend components and services used across all applications)
    • Pool
      • 1 Product Manager
      • 1 Engineering Manager
      • 5 Engineers

Each Feature Group product manager is responsible for the intake for that group. They work closely with other Feature Groups to ensure that any cross cutting initiatives can be developed efficiently with other groups. Ad-hoc teams can be made up of only the feature group’s members, or they can include people from other groups (including platform groups) for a finite period of time.

Key points here are that:

  • The Feature Group managers are responsible for the overall health and output of all teams in that group. This means that while they are required to produce a set of KPIs that are consistent across groups, their approach to producing those metrics and meeting their KPIs are their own.
  • Teams work on initiatives. Initiatives can be short-lived or long-lived. Short-lived initiatives in this case could be the development of a new feature - like Snooze in the Email Feature Group. Long-lived initiatives are meant for areas that see a very consistent stream of work and there are benefits to institutional knowledge - an example might be an Email Composition team.
  • Platform Groups’ customers are the Feature Groups. They will work closely with the Feature Groups to build backlogs. A good example of this is when they see that multiple Feature Groups are building out features that are actually common concerns but they’re being built independently and without consistency.

Wrapping It Up

Everyone wants predictability but if you start with the assumption that anything is predictable then you have already failed. However, you can make things less unpredictable. If you start with that mindset, then you can start to relinquish a little bit of control by removing dictates and installing guardrails instead. Building a system that is focused more on trust and autonomy is a good way to achieve better outcomes while building a healthier culture. The above is one way to do that but it’s definitely not the only way. And every team has its own character. If you observe and adjust based on circumstances - with a north star of trust and autonomy, then you should be confident in the outcome.