Requirements – Jim’s Blog

Summary

The Scaled Agile Framework® (SAFe)[i] contains a method for initializing and normalizing an Agile team’s effort and/or complexity estimates, the use of which can result in poor behavior by Agile teams. In defending this claim of danger, this paper first discusses Planning Poker and Story Pointing in Scrum as background information, and highlights the importance of relative and unanchored estimating. A brief discussion of SAFe®’s normalized story point estimation method follows. Poor behaviors observed on teams at a recent client of the author’s is then discussed. SAFe®’s normalized story point estimation initialization technique is hypothesized as part of the cause. Finally, a brief discussion of a proposed solution is offered. A followup paper is proposed if necessary that would discuss solutions that have been tried, and their level of success.

Planning Poker and Story Points – background

In Agile/Scrum[ii], story pointing is a method for estimating the amount of work to be done by a development team over a period of time in a predictable manner. It is usually done by a team via an estimating procedure called Planning Poker[iii], which yields a relative estimate for the work to complete a requirement based on a small reference requirement whose baseline point value is arbitrarily assigned, usually 1, 2 or 3. It is called story pointing because story point values have no units (this means they do not refer to hours or any other duration or cost), and because the requirements in Agile/Scrum take the form of use case-like statements called user stories[iv] which contain a user role, a statement of functional need, and a statement of value.

Planning Poker is a variant of an estimation method developed in the 1950s-60s at the Rand Corporation called Delphi. The Delphi Method[v] is a systematic, structured communication method that includes participant anonymity and simultaneity (avoiding the influence of other participants), a consensus basis, and regular feedback (each of which contributes to gaining agreement and commitment). Barry Boehm and John Farquhar originated the Wideband[vi] variant of the Delphi method in the 1970s, calling it wideband because the new method involved greater collaboration among those participating. Finally, Planning Poker is a “gamified” form of Wideband Delphi.

Estimates in Planning Poker take the form of a number in a (modified) Fibonacci sequence[vii]. That is, suppose our reference story is assigned 2 story points; then a relative estimate of the work for some other user story might be 2 (roughly the same effort and/or complexity[viii]), or 3 (a bit more), or 5 (more), or 8 (a lot more[ix]), etc. Many cite that the reason for using relative estimating and Fibonacci is to reflect the inherent uncertainty in estimating larger items[x] and to avoid equating the relative estimates with specific time units like hours. The industry also has found empirically that relative estimates yield better predictability properties for a team[xi].

The Fibonacci sequence has the interesting property that the ratio of F_n+1/F_n converges (i.e. limit as n approaches infinity) to an irrational number called the Golden Ratio[xii] phi = (1+5^0.5)/2 = 1.6180339887…

Phi appears surprisingly often in nature, such as the arrangement of leaves and branches in plants, the proportions of chemical compounds and the geometry of crystals. Its use in Planning Poker (via the Fibonacci sequence) is – perhaps due to its frequent appearance in nature -because the human mind perceives ratios larger than phi as significant in some sense, and ratios smaller than phi as insignificant[xiii]. A second reason is that it forces participants to avoid simple ratios like “twice” or “four time as big”, or “half as big”[xiv]. Using hours or days in lieu of Fibonacci-based points leaves a team free to use such simple ratios and to quibble over relatively insignificant differences unnecessarily and wastefully.

SAFe®’s Story Point Initialization

Tucked into the intellectual capital on SAFe® team-level iteration planning[xv] is the concept of Normalized Story Point Estimating. First it is acknowledged that in Scrum, each team’s velocity[xvi] is associated only with that team. However, it is asserted, in SAFe®, story point estimation shall be normalized. The reason given is that estimates for requirements such as features whose development comes from multiple teams must be based on the same story point definition. This, in turn, is said to provide a way to perform ART[xvii] and Solution-level economic decision-making on a common basis.

The following algorithm for normalizing story point estimating across multiple teams is offered by SAFe® on its team-level iteration planning page:

1. Normalize story points:

Find a story that will consume about ½ day in development and ½ day for test and validation; assign this story 1 story point; estimate your stories relative to this baseline story

2. Establish the team velocity V_team prior to the existence of historical data:

Let the effective team size be N_team, i.e. the total number of developers and testers on the team

Let D_L be the total number of effective team-member vacation, holiday, sick and other leave days anticipated for the iteration or sprint (for all the team members)

Then:

where A_t is the fraction allocation – A_t is in (0,1] – for each team member t, e.g. each FTE[xviii] on the team who is allocated full-time to that team has an A_t of 1.0.

In 1. above, it is readily seen that 1 story point is equated to 1 day’s effort. The justification for the constant 8 in 2. above is similar, at least in the SAFe® SPC training class attended by the author: in a two week sprint, there are 10 days, then subtract 2 days for meetings and other miscellaneous inefficiencies. In other words, in order to normalize story pointing for collaboration during cross-team story point estimating, such as in ARTs, SAFe® asks that a time-based method for estimation initialization be used.

Story Points should not be about hours or days

The first issue with this advice is that story points, while they are about effort and complexity, are not about hours or days. While it is clear that a story that has more effort and complexity takes more time, how much more varies from team to team and with the situation. Let’s hear it from one of the acknowledged experts, Mike Cohn[xix] (underlining is my emphasis):

I’ve been quite adamant lately that story points are about time, specifically effort. But that does not mean you should say something like, “One story point = eight hours.”

Doing this obviates the main reason to use story points in the first place. Story points are helpful because they allow team members who perform at different speeds to communicate and estimate collaboratively.

Two developers can start by estimating a given user story as one point even if their individual estimates of the actual time on task differ. Starting with that estimate, they can then agree to estimate something as two points if each agree it will take twice as long as the first story.

When story points [are] equated to hours, team members can no longer do this. If someone instructs team members that one point equals eight (or any number of) hours, the benefits of estimating in an abstract but relatively meaningful unit like story points are lost.

When told to estimate this way, the team member will mentally estimate first in number of hours and then convert that estimate to points. Something the developer estimates to be 16 hours will be converted to 2 points.

Contrast this with a team member’s thought process when estimating in story points as they are truly intended. In this case, team members will consider how long each new story will take in comparison to other stories. For example, you and I might agree that a new story will take twice as long as a one-point story, and so we agree it’s a two.

Knowledge and use of the SAFe® normalization approach is leading to poor behaviors

The second issue with SAFe®’s advice stems from my own consulting team’s experience with clients using the SAFe® story point normalization and initialization process. In our experience it demonstrably leads to

being an excuse to allow anchored behavior, i.e. non-anonymous and non-simultaneous effort and/or complexity estimating by teams
non-relative estimating, i.e. use of hours as a means to derive story points, which means of course that one might as well just use hours (at least it is more honest)
management imposition of target velocities for teams as a misguided productivity motivator[xx].

With regard to the last bullet, let’s remind ourselves that in order to double a team’s velocity so that they can meet a target velocity imposed on them, all the team needs to do is halve the size of the reference requirement or user story, or double the number of story points assigned to that reference story.

Solution

“Help Teams excel, don’t punish them.”[xxi]

SAFe® claims that story point normalization is needed “so that estimates for Features or Epics that require the support of multiple teams are based on the same story point definition, allowing a shared basis for economic decision making.[xxii] The author does not buy this argument. Each team has a run-rate (cost per unit time), and each team commits to developing a certain set of requirements, and therefore value, in each 2 week iteration and/or in each 10 week program increment²³. That value is sufficient to determine the economics of the situation where tradeoffs are necessary; such tradeoffs take place no lower than at the team level anyway. Moreover, the team who has a history performing using Scrum who is subsequently assigned to an Agile Release Train arrives at the Train’s first PI Planning[xxiii] meeting with an unnormalized velocity already in place. One should be reluctant to disturb the team’s existing velocity.

Suppose a team is assigned to an ART, and is also just starting to use Scrum. How should such a team in an ART initialize their velocity? Despite several expert Scrum sites that warn against anchoring using time, only to propose a time-based initialization method just as does SAFe®, (e.g. [xxiv] )!, VersionOne suggests what may be a better procedure: “Initially, teams new to Agile software development [with Scrum] should just dive in and select an initial velocity using available guidelines and information.”[xxv] That is, you know your team, just give it your best shot! Remember, this exercise starts with a reference user story, that story to which an arbitrary story points value was assigned – be that value 1, 2, or 3 (since different Agile sites suggest each of these three arbitrary values early in the Fibonacci sequence). Will your initial velocity be right? Quite unlikely! The goal is not the impossible one of being predictable in your very first sprint. The goal is the continuous improvement of the team’s predictability over time. Predictability is valuable[xxvi] because it generates trust. This is a good goal.

Epilogue … not everything transcribed well from the original Word document. Please let me know if you see any errors, thank you.

[i] Dean Leffingwell’s framework for scaling Agile development, – see http://www.scaledagile.com (corporate/administrative) and http://www.scaledagileframework.com (technical, and by the way, highly “clickable”)

[ii] What is Scrum? : https://www.scrum.org/resources/what-is-scrum?gclid=Cj0KCQiAyZLSBRDpARIsAH66VQItwbMIu3mxrGvzBy2P-ZWhn9AhkWLTbN7yY7q3fYr_Z8-9vnBRrogaAnl0EALw_wcB

[iii] https://en.wikipedia.org/wiki/Planning_poker

[iv] https://www.mountaingoatsoftware.com/agile/user-stories

[v] https://en.wikipedia.org/wiki/Delphi_method

[vi] https://en.wikipedia.org/wiki/Wideband_delphi

[vii] The Fibonacci sequence, defined by F_n+2=F_n+1+F_n where F₁=1 & F₂=1 (or optionally F₀=0 & F₁=1), starts with (optionally) 0, then 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, etc. Regarding “modified”: one always modifies the sequence for use in estimating by including only a single 1. Additionally, perhaps because it’s easier to think about these numbers, larger numbers can be rounded, e.g. 20, 40, 100 instead of 21, 34, 55, 89, and sometimes more esoteric values are included such as 0 (meaning trivial), ½, infinity, “?” and the flippant “I’ll go make some coffee”. One such scheme is codified in a commercial card deck product: https://store.mountaingoatsoftware.com .

[viii] This intentionally avoids the current discussion in the literature about whether story pointing should be based on effort (per Cohn and others, e.g. https://www.mountaingoatsoftware.com/blog/dont-equate-story-points-to-hours) or complexity (per Giddings and others, e.g. https://www.clearvision-cm.com/blog/why-story-points-are-a-measure-of-complexity-not-effort/)

[ix] Why the phrase “a lot more” instead of “four times more”? After all, 8/2 is 4. The answer is that some experts/authors don’t believe it is correct to make that assumption, in particular because of the presence of uncertainty in the estimate. As with the complexity vs. effort argument referenced earlier, discussion of that topic is being intentionally avoided.

[x] It has been difficult to find where this was originally stated. Wikipedia’s Planning_poker page says “citation needed”. Several other references were consulted, and they either make this statement without citation, or they cite Wikipedia. A reasonable guess is that it’s in one of Mike Cohn’s books. Stack Overflow, at https://stackoverflow.com/questions/9362286/why-is-the-fibonacci-series-used-in-agile-planning-poker, contains the amusing statement that this description on Wikipedia holds “the mysterious sentence” and then echoes the phrase, “reflect the inherent uncertainty in estimating larger items”. Regardless, the author believes the statement to be reasonably accurate.

[xi] http://blogs.collab.net/agile/perfectly-predictable-why-story-points-are-better-than-detailed-estimates and http://gettingpredictable.com/the-attitude-of-estimation/

[xii] https://en.wikipedia.org/wiki/Golden_ratio

[xiii] I swear I have read this before! and it was in a decent reference; I am searching desperately for the citation, yes indeed … but I have not yet found it

[xiv] https://www.scrum.org/forum/scrum-forum/7897/why-do-we-use-fibonacci-series-estimation

[xv] http://www.scaledagileframework.com/iteration-planning/

[xvi] Velocity: as used here, velocity is a key to improving the predictability of an Agile development team. Velocity is an assessment of how many story points a single team can commit to achieving, or performing, in a single iteration or sprint. When a team has a history of prior sprints’ story points achievement, velocity is some reasonable function of that history – the function is determined by the team but an average is a good start. When the team has no such history, this is when SAFe®’s normalization/initialization process might be applied. Scrum.org has a good page (https://www.scruminc.com/velocity/) on velocity:

Another good page on velocity is: https://www.scrumalliance.org/community/articles/2014/february/velocity .

[xvii] ART: a SAFe® Agile Release Train, SAFe®’s organizational structure for multiple, persistent Agile development teams; see http://www.scaledagileframework.com/agile-release-train

[xviii] FTE: full-time employee

[xix] https://www.mountaingoatsoftware.com/blog/dont-equate-story-points-to-hours

[xx] https://vimeo.com/49263000 is a superb video by Dan Pink which speaks to how real motivation of knowledge workers arises.

[xxi] https://www.scrumalliance.org/community/articles/2014/february/velocity

[xxii] http://www.scaledagileframework.com/iteration-planning/

[xxiii] http://www.scaledagileframework.com/pi-planning/

[xxiv] https://stackoverflow.com/questions/1232281/how-to-measure-estimate-and-story-points-in-scrum “… start out by assuming a story point is a single ‘ideal day’ …”

[xxv] https://www.versionone.com/agile-101/agile-management-practices/agile-scrum-velocity/

[xxvi] https://dzone.com/articles/predictability-really-what-we , https://uxmag.com/articles/being-predictable ; also information on predictability metrics: https://www.leadingagile.com/2013/07/agile-health-metrics-for-predictability/ , http://www.scaledagileframework.com/metrics/#P2 ,

I’ve been espousing for awhile now about #rampantComplexity, posting the occasional article where software and systems are needlessly hard to use or chaotically interacting. Recently I have experienced several examples right out of my home. Generally I think these issues point to complexity, to way too little automated testing and automated regression testing, and finally to too few developers and product managers who understand how to elicit good requirements and usage scenarios (today called user stories, in the past often called use cases).

Our new dishwasher. It’s a simple thing: like every other dishwasher I’ve ever had, I’d like to be able to warm dishes in it. I’ve even had some with a “warm dishes” setting. However, my current model just has a few cycle options. All imply getting the dishes wet first. There’s no dry-only or heat option. Why are they taking functionality away from us?! #DontTheyThinkAboutUseCases?
Our microwave and oven combination is hilarious, there are so many little things wrong with its user interface. First, there is a button lock. If you push these three or four buttons in this order, the interface locks, and an unlock button appears on the face. It turns out that the easiest way to lock the interface is to wash the face of the unit with a cleaning cloth. My wife has done this several times unintentionally; she then comes to me wondering why the appliance no longer functions. My explanation makes sense but she has forgotten about it weeks later when the problem reoccurs. #DontTheyThinkAboutUseCases?
Moving to the next microwave/oven interface issue, if you open the microwave door to check on the progress of the object you’re “nuking”, you can then close the door and elect to turn it back on and continue heating the item. That continue button works about 90% of the time. Sometimes though it just ignores you, and you are forced to completely start over. #DontTheyTestThisStuff?
There is a delayed start option for the oven, but I am not patient enough to figure out how it works. There is a store program option as well, but it is not obvious what a program even is. The documentation is unfathomable. #RTFM
If the recipe says set the oven temperature to 365 degrees, one cannot follow the directive. The interface is a cool-looking slider, and it’s limited to 25 degree increments. Oh wait. Two years after writing that, I’ve discovered there’s an additional interface that allows 5 degree increments.
The refrigerator has an aural beep which repeats, informing you that the door has been left door open. An almost-closed door is open in the eyes of the software, but the speaker is inside the refrigerator so it’s muffled by the mostly-closed door. Linda has some hearing loss and cannot hear the tones at all unless right next to fridge.
The same sound is emitted by the refrigerator if the temperature inside is too high. Typically this occurs because the door was left open for some period. Of course when we discover this we close the door. The sound continues until the temperature is restored to 38 degrees by the fridge. There is no way to defeat this annoying sound. Even though the door is closed, the sound continues ad nauseum.
I have an antique in my garage, a 1995 Explorer. To be fair, our community’s knowledge of systems complexity was much less mature in those days, but it’s still in my garage, and I love this wild story. The car has an outside air temperature (OAT) sensor in it, and a display for that temperature in the car, overhead between the front seats. I found out at my Ford dealer the location of the OAT sensor: it’s in the engine compartment. Really? There’s more noise in there than signal!! As it transpires, in the instrument cluster there is an ECU (CPU for computer guys, ECU means electronic control unit), and it has an algorithm running which understands the current state of the system (how long the engine has been running, etc.) which compensates for the noise. Wow. Over many years I’ve found that the algorithm is right to within about 4 degrees most of the time.
But wait, there’s more. In the next model year 1996, a different climate control system vendor was used, and it had an OAT display. The vendor wanted a feed from the OAT sensor. Whoever they asked was aware of the sensor and the algorithm, but instead of passing along, say, an API, to get at the algorithm’s output, they were handed a requirements document explaining the algorithm. The vendor functionally duplicated this algorithm in their climate control system. All was well until in the field, customers reported that the two OAT readouts were Not Always The Same!! Implement a complex algorithm two different ways, and you can often see this happen for yourself, and the hardware running the algorithm was different also. They apparently performed a recall over this. Sadly, most recalls cost an extraordinary amount of money.
Though our 2011 Escape does not suffer from this problem nearly as bad as some of the newer cars out there … I tweeted (as this was originally written, in 2017, but I’m no longer on twitter) about new cars and an article about them written by the WSJ. “Touchy touch screens, buggy software, mystery sounds, all baffling to drivers, forcing some to enroll in two-hour seminars. And then the beeping started …” I’ve seen this with a variety of people in my life, but most especially my mother, who wouldn’t even dream of buying a laptop computer much less a new car. She hung onto her 1993 Thunderbird until she gave up driving last year. (The WSJ article is here I hope you can see it despite the paywall)
Recently, when my wife Linda tried to visit to her work website, nothing happened, she’d get just a blank screen and spinning. She could visit any website we could think of … except her work website. We decided to investigate. Her work website worked fine on her iPhone and several other devices, but still not on her work PC. We cleared all the stupid caches, cookies and crackers – no help. We tried two other browsers, one freshly downloaded – no help. Rebooted the laptop – no help. Shut it down and waited 5 minutes, rebooted the laptop – no help. Linda called the tech support folks, they read from their scripts – no help. Then I power-cycled the cable internet modem in the house to reboot it. That worked. *sigh*

What would YOU do to try and keep these issues from arising in your next product effort?

Tag: Requirements

Planning Poker on your phone

The Dangers of Normalized Story Point Estimation