“Technological disasters could no longer be ascribed to isolated equipment malfunction, operator error or random acts of God.” Nick Pidgeon
October is National Cyber Security Awareness Month (NCSAM), a time to focus on how cybersecurity is a shared responsibility that affects all Americans. The 2019 NCSAM brings us the theme: Own IT. Secure IT. Protect IT.
I recently attended and presented at the Fall DHS CISA Industrial Control System Joint Working Group (ICSJWG) meeting in Springfield, MA. During that meeting I learned quite a lot about the state of industrial control security as it relates to critical infrastructure and, more generally, all information security. Own IT. Secure IT. Protect IT. was chose to convey a message of “personal accountability and stress the importance of taking proactive steps to enhance cybersecurity at home and in the workplace.” I’ve been thinking on this and, while I do think we need to continue this emphasis on educating, I’m not sure it’s going to be that simple.
Years ago, I read Charles Perrow’s fantastic book “Normal Accidents: Living with High-Risk Technologies” which analyses the social side of technological risk. In Normal Accidents Perrow argues that the engineering driven approach to system safety is doomed to fail due to the vastly complex systems he was looking at. When the book was first written in 1984, Perrow was looking complex systems like (among others) nuclear power (Three Mile Island accident. The disaster at Chernobyl hadn’t occurred yet), aviation (the Airbus A310 entered service in 1984 and was the most advanced commercial jet), and space (Apollo stuff. Challenger didn’t happen for several more years). In other words, he was making these observations at a time when complex technology was quaint by today’s standards. In the interim, we’ve seen an unbelievable increase in technological capacity drive, primarily, by the Internet and its rapid penetration into humanity.
Perrow challenged the regular approach of viewing accidents (let’s use his definition here: “an unintended and untoward event”) which looked primarily at single causes, assumed human error, and attempted to show that one could’ve seen it coming if they had just paid attention better. Instead, Perrow argued that the above was not the correct approach and that these accidents were caused because:
- The systems are complex
- The systems are tightly coupled
- The systems can pose catastrophic potential
Perrow called accidents that occurred due to these circumstances normal accidents because they are inevitable and will occur when multiple failures interact despite efforts to avoid them. In other words, big accidents have small beginnings.
An example to help illustrate this point is the Air France Flight 4590. The elapsed time for this entire accident was around 2 minutes (7 if you include the DC10’s departure). This flow doesn’t even include every item that occurred, only the most important of the myriad of small failures that interacted to result in the crash of Flight 4590 and the loss of 113 lives. I chose this example because the Concorde SST was one of the safest airliners ever built and was, by necessity, crewed with extremely competent pilots and flight engineers. Granted, this accident occurred in 2000, but we still have commercial aircraft crashes and, if one of the safest systems we built could suffer a normal accident, then I’d argue they all can.
So, how does a plane crash relate to cybersecurity and NCSAM? There are two other concepts we need from Perrow:
- Interactive complexity
- Tight coupling
A corporation has interactive complexity due to the many relationships between people; however, it’s only loosely coupled since changes tend to flow slowly through the business. However, modern information technology environments are interactively complex and tightly coupled – events may propagate via many (often, unknown) pathways and they can quickly aggregate to bring down other system functions. This, if we agree with Perrow, guarantees that we will have information technology accidents.
What I find interesting about this is that we can begin to look at cybersecurity events not through the lens of uniqueness, but rather as an inevitable accident that will occur regardless of what controls we put in place. This does not mean that we need to give up, however. The very example of air travel can inform us here also.
Fatalities per trillion revenue passenger kilometres from 1970 to 2018 has indicated a massive drop in the number of fatalities throughout the years. If normal accidents are so normal, then why has this drop happened? Here are a few reasons I can think of:
- The creation of regulatory bodies and enforcement (FAA, ICAO).
- The adoption of rigorous accident investigation (NTSB, BEA).
- The open and public sharing of accident retrospectives.
- Improved survivability during an accident (evacuation procedures, airport design, better materials).
- Improved training, certifications, and crew resource management.
Which brings me back to NCSAM and the goal of “personal accountability and [stressing] the importance of taking proactive steps to enhance cybersecurity at home and in the workplace.” That isn’t going to be effective. The systems involved are too complex and too tightly coupled to assume that personal accountability and proactive steps will gain much in the way of results. What need to be looking at, is what we did with aviation:
- We need strong, consistent, and enforcing regulatory frameworks.
- We need frameworks that require performing post-incident analyses.
- We need those analyses to be openly and publicly shared, for all to see.
- We need to emphasise resilience during an incident, not prevention.
- We need to build effective training that puts the toolkit of the responder in everyone’s hands.
We need to build accountability and proactive steps into the system, not the people. Remember, the crew of Air France Flight 4590 was highly accountable and took proactive steps, but that wasn’t the problem that needed solving.
Jack Hamm, Chief Information Security Officer, Gigamon