The recent cyber event, whereby the security firm CrowdStrike pushed out faulty antivirus software to Microsoft systems on 19 July 2024, has been touted as causing the most impactful IT outage in history, with potential worldwide costs exceeding US$1 billion. Secondary effects, relating to canceled flights, unresponsive hospital systems, inaccessible financial systems, and affected small businesses, could result in less tangible and indirect costs many times greater than that.
For example, Delta Airlines claims that the outage cost them $500 million, whereas CrowdStrike and Microsoft maintain that the losses were due to poor management at Delta. Traditionally, software companies have absolved themselves contractually from liability for consequential damages. This becomes more of an issue as software takes over many logical and physical functions formerly using human intelligence and hardware.
Seemingly, the total cost of the outage is expected to have a relatively minor financial effect on the global economy. It does, however, illustrate what could happen on a much larger scale if nothing is done to mitigate this type of risk.
How does one describe the CrowdStrike cyber event? Was it a malicious cyberattack or an unintended error? Reports suggest that it is a case of the latter, although (at the time of writing) we do not know precisely how the defect may have been introduced and was not caught in testing.
In an ISACA Journal article, I differentiated between a cyberpandemic and major cyberattack, with the former being more likely, and the latter resulting in greater impact. While experience over the past several years has clearly changed some of the probabilities and impacts presented in the article, it still seems reasonable to differentiate between a cyberpandemic and a major cyberattack.
Reporters were quick to jump on a similarity between the CrowdStrike event and Y2K century-roll over “bug.” But they are quite different, in my opinion. We knew of the Y2K “bug” years ahead of time and had the opportunity to prepare by correcting the problem. While many presume Y2K to be a “non-event,” it might have had much greater impact had the corrections not taken place.
On the other hand, the CrowdStrike event was not prepared for despite warnings two decades ago by Dan Geer, Bruce Schneier, and other cybersecurity experts, of the potential danger of software monocultures. Ironically, this monograph pointed specifically to Microsoft’s monopoly. Also, in my 2004 book, I addressed the risks of outsourcing information security and suggested ways in which to deal with them. That is to say, this is by no means a new issue, although some reporters appear to be treating it as such. Furthermore, those with older systems, such as Southwest Airlines, escaped the problem because they were using software from the 1990s. Again, the opposite of Y2K, where older systems mishandled the century date roll over.
Several cybersecurity experts have suggested using backups, among other methods, to avoid this type of issue in the future. Such an approach is good for mitigating the impact of ransomware and similar attacks, but hardly helps with a CrowdStrike-type event. The key here is to diversify software from third parties (including open source). Another suggestion is to phase the rollout of updates to fewer customers at a time rather than all at once. In this way, problems may be detected early on a limited number of systems and corrected for the majority not yet updated.
As systems-of-systems become more complex and interoperative, and we depend on fewer sources and suppliers, we can expect this type of problem to increase. Now is the time to reduce complexity and interdependence, and to introduce procedures to mitigate the risk of recurrence.