WSJF

If you only quantify one thing, quantify the cost of delay.

—Don Reinertsen

WSJF (Weighted Shortest Job First) Abstract

SAFe is intended for application in situations in which Agile Release Trains (ARTs) are engaged in ongoing, continuous development—a flow of work—that makes up the Enterprise’s incremental development effort. As such, it avoids the overhead and delays of the start-stop-start nature of traditional projects and programs, whereby various project authorizations and phase gates are used to control the program and its economics.

While this continuous flow model helps eliminate delays and keeps the system lean, we do have to ensure that the system’s priorities are constantly updated so that the value provides the best economic outcomes for the business. In flow, it is item sequencing (rather than theoretical individual item ROI) that drives the best economic result. To that end, WSJF illustrates how the ART, the Solution, and the Portfolio Backlogs are reprioritized using the Weighted Shortest Job First via calculating the cost of delay and job size (proxy for duration). Using this algorithm at PI boundaries continuously updates the job’s priorities based on current business context, value, time, development facts, risk, and effort considerations. It also conveniently and automatically ignores sunk costs, which is a key principle of Lean economics.

Details

Reinertsen [2] describes a comprehensive model called “Weighted Shortest Job First” (WSJF) for prioritizing jobs based on the economics of product development flow. WSJF is calculated as the cost of delay divided by job duration. Jobs that can deliver the most value (or cost of delay) and are of the shortest duration are selected first for implementation. When applied in SAFe, the model supports a number of additional key principles of product development flow, including:

  • Take an economic view
  • Ignore sunk costs
  • If you only quantify one thing, quantify the cost of delay
  • Economic choices must be made continuously
  • Use decision rules to decentralize economic control

The impact of properly applying WSJF can be seen in Figure 1. (See [2] for full discussion.) The areas shaded in blue illustrate the total cost of delay in each case. Doing the weighted shortest job first delivers the best economics.

Figure 1. The economic effect of doing the Weighted Shortest Job First (WSJF); cost of delay for work
Figure 1. The economic effect of doing the Weighted Shortest Job First (WSJF); cost of delay for work

Calculating the Cost of Delay

In SAFe, our “jobs” are the Epics and the Features and Capabilities we develop, so we need to establish both the cost of delay and the duration for each job. There are three primary elements that contribute to the cost of delay:

User-Business Value: Do our users prefer this over that? What is the revenue impact on our business? Is there a potential penalty or other negative impact if we delay?

Time Criticality: How does the user/business value decay over time? Is there a fixed deadline? Will they wait for us or move to another solution? Are there Milestones in the critical path impacted by this?

Risk Reduction-Opportunity Enablement Value: What else does this do for our business? Does it reduce the risk of this or a future delivery? Is there value in the information we will receive? Will this feature open up new business opportunities?

Moreover, since we are in continuous flow and should have a large enough backlog to choose from, we needn’t worry about the absolute numbers. We can just compare backlog items relative to each other using the modified Fibonacci numbers we use in estimating poker. Then the relative cost of delay for a job is:

Cost of Delay = User-Business Value + Time Criticality + Risk Reduction and/or Oppty Enablement

Duration

Next we need to understand job duration. That can be pretty difficult to figure, especially since early on we perhaps don’t yet know who is going to do the work or what capacity allocation they might be able to give it. So we probably don’t really know. Fortunately, we have a ready proxy: job size. In systems with fixed resources, job size is a good proxy for duration. (If I’m the only one mowing my lawn, and the front yard is three times bigger than the back yard, the front lawn is going to take three times longer to mow.) And we know how to estimate item size in story points already (see Features). Taking job size, we have a reasonably straightforward calculation for comparing jobs via WSJF, as Figure 2 illustrates:

Figure 2. A formula for calculating WSJF
Figure 2. A formula for calculating WSJF

Then, for example, we can create a simple table to compare jobs (three jobs in this case), as shown in Figure 3:

Figure 3. A sample spreadsheet for calculating WSJF
Figure 3. A sample spreadsheet for calculating WSJF

To use the table, the team rates each job relative to other jobs on each of the three parameters. (Note: With relative estimating, you look at one column at a time, set the smallest item to a one, and then set the others relative to that item.) Then divide by the size of the job (which can be either a relative estimate or an absolute number based on the estimates contained in the backlog) and calculate a number that ranks the job’s priority.

The job with the highest WSJF is the next most important item to do.

One outcome of this model is that really big, important jobs have to be divided into smaller, pretty important jobs in order to make the cut against easier ways of making money (i.e., small, low-risk jobs that your Customers are willing to pay for now). But that’s just Agile at work. Since the implementation is incremental, whenever a continuing job doesn’t rank well against its peers, then you have likely satisfied that particular requirement sufficiently that you can move on to the next job.

As we have described, another advantage of the model is that it is not necessary to determine the absolute value of any of these numbers. Rather, you only need to rate the parameters of each job against the other jobs from the same backlog.

Finally, as the backlog estimates should include only the job size remaining, then frequent reprioritization means that the system will automatically ignore sunk costs.

A Note on Job Size as a Proxy for Duration

However, we do have to be careful about the proxy we chose for duration. If availability of resources means that a larger job may be delivered more quickly than some other item with about equal value, then we probably know enough about the job to use duration to have a more accurate result. (If I can get three people to mow my front lawn while I mow the back, then these items have about the same duration, but not the same cost.) But this is rarely necessary in the flow of value, in part because if there is some small error in selection, that next important job will make its way up soon enough.


Learn More

[1] Leffingwell, Dean. Agile Software Requirements: Lean Requirements Practices for Teams, Programs, and the Enterprise. Addison-Wesley, 2011.

[2] Reinertsen, Don. Principles of Product Development Flow: Second Generation Lean Product Development. Celeritas Publishing, 2009.

Last update: 19 May 2016