When airplane engines malfunction, when utility grids blackout, when railways break down, the consequence is, at the very least, disruptive and costly. When human lives are involved, failures are devastating.

As systems grow more complex, the number of ways they can fail grows too. More variables and more interdependencies create more potential failure points. Complex systems, however, are usually designed with failure in mind. Redundant controls, backup systems, manual overrides, warning systems, and automatic actions are some of the ways that complex systems account for failure. The Industrial Internet, with its combination of self-aware assets, rapid communication, and predictive analytics helps tremendously with this kind of failure protection.

But the real problem with complex systems isn't the occurrence of single-point failures. It's the occurrence of system failures, when multiple things go wrong at the same time. When catastrophes occur, they're usually the result of multiple single-point failures coinciding, defeating all of the safety measures, redundancies, and failsafe systems put in place to handle single-point failures.

Can the Industrial Internet evolve to help systems fight catastrophes better?


Catastrophes make headlines. But they're high profile for their rarity as much as their devastation.

Because system accidents require multiple single-point failures to add up, they are probabilistically low, and occur infrequently.

The way complex system failures develop is described by the "Swiss Cheese Model" of accidents. First detailed by James T. Reason and Dante Orlandella of the University of Manchester, the Swiss cheese model has become a reference for accident analysis in several sectors, most notably in aviation.

The model has the following features:

  1. Systems have multiple layers of defense against failure, which are represented as slices of Swiss cheese, stacked up against one another.
  2. Each slice has its own holes. These holes vary both in their size and position across a slice's surface. Holes represent point-failures of varying severity within a system.
  3. The stacks add up. Imagine shooting a pea through a hole in the first layer of Swiss cheese. It might go through that first layer, but if there's no hole at that same position in the second layer, the pea stops and doesn't get through. The system remains safe. The pea in this case represents catastrophic system failure.
  4. The holes add up. The layers in the system are constantly changing. Eventually, a hole in one layer will overlap with a hole in another layer, allowing a pea (catastrophic system failure) to get through the system. This is when catastrophe occurs. System failures, in this way, become an inherent property of any complex system.

So how can they be averted?


The Industrial Internet offers a great potential for failure anticipation and preventative action. By having sensor-equipped assets constantly in communication with each other and linked to data centers generating powerful predictive analytics, there's a much better infrastructure for anticipating, detecting, and preventing single point failures.

That's significant. If single point failures can be anticipated and prevented, then there's no opportunity for them to compound and lead to catastrophes. In other words, if catastrophes get through because holes in slices line up, the Industrial Internet could help to close those holes as they happen so they never become a problem.

But the Industrial Internet can help in another major way. It can offer, in an unprecedented manner, a complete end-to-end view of a given complex system, giving visibility to elements that would have previously been undetected. In this way, it would be possible to anticipate potential catastrophes by seeing, through data and simulations, how multipoint failures could happen. It might not be possible to predict every scenario, but even determining the most likely catastrophes would provide huge improvements to the stability and safety of complex systems.

Complex system failures are a serious problem because they have a high impact, despite being rare. But with the Industrial Internet come better tools to understand, anticipate, and prevent catastrophe roots from taking hold, making systems more adaptable, visible, and stronger than ever before.  

About the author

Suhas Sreedhar

Strategic Writer at GT Nexus