Systems, Complexity, Crashes, Oh My

This blog entry is written together with my old IBM buddy Mike Mott, Distinguished Engineer Retired^[1]

As my colleagues and friends are aware, I am fond of discovering unbridled and poorly managed complexities that result in bad stuff. I don’t look for such issues that have killed people, though it has happened. But pretty much everything else is fair game, including the YF-22 accident where the test pilot left the prototype airplane via the ejection seat because he suddenly was utterly unable to control the airplane. As it transpires, there was a section of computer code tasked with translating pilot commands into control surface actions where there were eight discrete modes, and the pilot had unwittingly flown into the one of those eight that had never been tested until now. There was a sign error. Up was down, down was up. Close to the ground, there was no time to figure it out. Scratch one really expensive fighter jet.

A recent, and very public, complexity issue presents itself as a target rich case study for how to avoid doing business with software, and that is the results reporting from the Iowa Caucuses. There are much better approaches to the management of high-tech programs. There is a huge difference between hacking out an app for a smart phone and building a system to support a caucus. The app is used in support of a process of vote counting, reporting and summary. All of the people involved in this process must be properly trained to perform their roles. The app must be realized in hardware, which requires loading and testing of the app in the production environment. The networks and servers that realize the results must provide enough capacity to perform the computer tasks within the timelines. Use Case modeling of the end to end system operation is an excellent way to flesh out the preparation work for the caucus and the performance of the precincts on caucus night.

Surveying websites such as sysml.org from time to time, the trend is good; modern system management technologies are evolving well. However, reviewing pmi.org, it is still stuck in the past with the same old work breakdown structuring and risk management approaches that we know lead to long feedback loops, poor quality and performance issues. There is a gap between the technologies available for the system itself, for system development, and for the approach used by program managers to pull it off.

Now, as my buddy Mike writes …

Looking at Iowa from afar, it seems obvious there was a total failure to plan the scenarios, the intended results, the usage models … or maybe anything at all, as though it was just going to work. I would bet the house, farm, and alley cat that no clear-cut statement of objectives were ever written or socialized. They needed measurable outcomes to shoot for, something like “we shall have the conference results complete within 2 hours of when the doors close to start the caucuses”. Another bet: not a single Use Case (or Epic, or group of scenarios, or whatever you want to call it) will be found in their statements of need that lays out the entire process of taking votes to result. And stunningly, it appears that no thought whatever was given to the hardware realization. They left it at “hey, we have an app”. According to Chad Wolf, Acting Secretary of DHS, during an interview Tuesday on Fox and Friends, there was no viable test plan, and it wasn’t for lack of resources, the DNC actually declined an offer to test the system by DHS’s Cybersecurity and Infrastructure Security Agency. It also appears that the folks involved even now don’t understand what happened. All these issues are being spoken of by the DNC as a simple app failure. Indeed, that’s a piece of it, but astute observers know that the larger failure was one of project management and/or product development management. Success on a crucial, scaled system must consider the people, process, tools, and technology. But no … they believed an app purchased from Mrs. Clinton’s former campaign manager would ensure the caucus ran well.

Alas, this seems very typical of how government spends our money. Even after so many lessons, in the form of failed systems, have been rolled out for the world to see. The government lays out huge projects that fail time after time, with billions wasted. Is there no self-reflection or retrospective at all? There doesn’t seem to be. Would it not make some sense to seek assistance from those who have actually delivered on big efforts? On this point, it is notable that Vivek Kundra, young Ph.D. IT wunderkind, President Obama’s information czar and the first CIO for the federal government, invited major players in the industry to come and explain to him how the government could improve their IT performance. I developed material on the topic for my colleague Dave McQueeney, still at IBM, who pitched the slides to Kundra. To his credit, Kundra eventually published a 25-point plan to improve the federal government’s IT performance. He leaned more heavily on Agile principles and Agile methodologies than on the architecture quality-driven Model Driven Systems Development (MDSD^[2]) process framework we developed at IBM Rational to manage such high-complexity programs. Alas, I understand that IBM just didn’t quite connect with Dr. Kundra. Today, he is out of government, and the initiatives he started spent around $1 billion – they never die! – but are effectively complete.

As a number of outlets reported, the testing of the app was limited (ref Slate^[3]). Indeed. Pulling on this notion of a system – the app was part of the larger process of the caucus. The caucus leadership had a process to follow. Was it documented? Was its use rehearsed? Obviously not. According to Slate, “The problem with the caucuses is that we don’t run them except in a major national election, so there’s no way to ramp up to it. Imagine going to war with only war games under your belt, without facing an actual battle.” Although they knew this, the Iowa DNC failed to properly train their people in the use of a new app; ensure that the new app was installed and ready to go; or verify that the system hardware was scaled properly to support the load.

Ok, Jim here. To summarize, it seems certain that eventually … eventually … the ideas of incremental development yielding short feedback loops, ongoing risk mitigation, more objective and measurable criteria, end-to-end testing of the technology with the processes, and explicit management of system architectures as represented in multiple stakeholder viewpoints will get through to the Project Management field. It appears to Mike and I that the expected path for that will be via Agile. Indeed, as it evolves, it is adopting many of the tenets of of MDSD.

^[1] https://www.linkedin.com/in/michael-mott-3590997/

^[2] https://www.academia.edu/9790859/Model-driven_systems_development

^[3] https://slate.com/technology/2020/02/iowa-caucus-app-fail-shadow.html

Leave a Reply Cancel reply