“Those who cannot remember the past are doomed to repeat it” – George Santayana (1863)
“I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.” – Abraham Maslow (The Psychology of Science) Anecdotally Maslow conveys that solutions often resemble previous solutions because of an innate reliance on the familiar. In many cases “critical thinking” or “thinking outside the box” only occurs after all of the familiar solutions have been disproven or don’t fit. In this vein, when considering root cause analysis (RCA) as a methodology to analyze and thus prevent catastrophic failures, we must learn to leverage industry specific knowledge and technology with the expertise of our team members.
RCA as a tool or methodology really gained widespread use in the 1960’s through the 1980’s. The methodology of diagrammatically linking hypothetical causes to actual failures was first popularized in Japan by Kaoru Ishikawa, who pioneered the use of Fishbone (also known as Herringbone) diagrams such as the one shown below.
This problem solving method was later enhanced with the Kaizen concept “change for the better” and “continuous improvement”, which eventually led to what is known today as Total Productive Maintenance (TPM). The main objective of TPM is to increase the productivity of a production facility with a modest investment in maintenance. As such RCA is seen as a fundamental element of any asset performance management (APM) program or initiative.
Undesirable outcomes are the result of multiple cause and effect relationships that line up over time. Regardless of the RCA tool that is used to represent these relationships (e.g. logic tree, fault tree, causal factors tree, fishbone, et al.), we can nonetheless agree that these relationships must exist and fail for the undesirable outcome to materialize.
The concept that flawed systems adversely impact human decision making is a critical factor in both the RCA cause-and-effect relationship and the way we employ information systems. These systems are the information systems that we use to help us make better decisions. Such systems include but are not limited to our training systems, purchasing practices, procedures, and policies.
In the following example the operator in a particular area used too much lubricant which resulted in a premature bearing failure on a pump. The operator tasked with pump lubrication incorrectly decided to over-lubricate because he was not trained in proper lubrication practices. The over-lubrication decision was further impacted by recent maintenance reductions that shifted lubrication responsibility from the maintenance department to the operations department. As a result, this diagram examines the over-lubrication failure and displays the resulting cause-and-effect decision making relationship.
Learning through root cause analysis
Pre-existing failure cause information in RCA serves as a teaching tool. The greatest learning that will come from any successful RCA effort takes place during team meetings. By continually asking how something happened, the team explores how a failure could have occurred based on a direct cause-and-effect relationship. Using the previous example of bearing failure due to over-lubrication, the likely outcomes for bearing failure could include:
- Improper installation
- Wrong bearing
- Defective bearing
- Over lubricated
- Under lubricated
- Wrong lubricant
When dealing with RCA people are trained to view the vertical cause and effect tree as a timeline. If we know that the bearing failed we can visualize the cause and effect by moving laterally along the timeline. Hypothesizing the possible failure options leads the team to conclude that there are only four plausible ways in which the bearing could have failed:
Any of the failure outcomes coupled with the cause of bearing failure would eventually cause one or more of the failure modes displayed in the diagram. If it is proven that fatigue is the failure mode and there is no evidence of additional failure modes, the team would ignore the other possible failure modes and continue following the fatigue path down the cause-and-effect tree. Now that the team knows how the bearing failed, the next solution that the team needs to determine is why the bearing failed.
As possible failure mode ideas are extracted from the team members, the team is actively constructing a knowledge or experience tree, which is often referred to as "Corporate Memory." Because the team members are working together to create a possible hypothesis, they are actually learning how to employ the cause-and-effect methodology. This process is not always simple, but deducing possible failures is a vital learning experience for the team and "Corporate Memory" is invaluable to ensure the mitigation of failure.
Want to learn more about the RCA capability within Predix Asset Performance Management? Visit our APM Reliability solution webpage.