Blog
The Industrial Internet of Things (IIoT) produces massive amounts of data at record speed. The sheer volume, variety, and velocity of this data can be overwhelming, especially to industrial companies--many of whom struggle to translate big data into meaningful and effective process improvements.
That’s where data science comes in.
Data science is the art of analyzing data and applying scientific principles to uncover key patterns that drive significant business value. Leveraging data science and its’ techniques such as machine learning—an algorithm’s ability to gain insight from data patterns—can further drive value by providing clearer insight into massive, confusing, and siloed industrial datasets.
More and more companies are embracing IIoT to drive better business outcomes, but they still aren’t making the most of the data they collect. The following examples showcase how data science can help solve some of the biggest and most common problems industrial organizations face with big data.
In the first part of this blog, I will discuss the application of machine learning to system monitoring, prediction and forecasting, and system survival analysis. Several other use cases will be covered in the second part of the blog, so stay tuned for more.
You need to make sure your systems operate properly and take appropriate action (raise alarms, create work orders, etc.) as soon as there is abnormal behavior or failure in the system or its components.
Many companies are accustomed to monitoring system components and reporting failures. One way around the manual monitoring approach is to implement Business Rules, which are logics defined by domain experts to detect abnormal behavior in a system and take proper actions when needed. A major challenge with the Business Rules approach is that people do not have the analytic capacity to simultaneously take hundreds and thousands of parameters into account while also considering every edge case. Also, these rules might be contradictory and could therefore end up generating numerous false alarms and target misses.
This is where data science, and, more specifically, machine learning can help. Instead of trying to define Business Rules, your data science team or a partner can use a machine learning algorithm that will process the data coming from your system and automatically separate failure operating conditions from normal ones. This process is well known as anomaly or outlier detection. Unsupervised machine learning algorithms such as LOF (Local Outlier Factor), k-NN (k-Nearest Neighbors), PCA (Principle Component Analysis), and unsupervised SVM (Support Vector Machine)can be used to address the anomaly detection problem, among others. The algorithms may produce different accuracy of detection dependent on your specific domain and corresponding data, and data scientists will explore and compare the accuracy to suggest the best approach for your specific needs. Few data science experts, including GE Digital’s Data Science Services team, will also incorporate physical modelling into anomaly detection to further improve the quality of outcomes needed for industrial applications.
By utilizing machine learning to automate the monitoring of processes, you will be able to not only avoid time-consuming business rules creation, but also “catch” failure situations that you’d never thought about before and weren’t captured by the rules.
Once your team has the results of automatic monitoring performed by the algorithm, your subject matter experts can review the outcomes and confirm or reject the failures that were detected. The machine learning algorithm will learn from such operator’s input, and the accuracy of failure detections will be improved automatically.
Even if you are able to identify and quickly respond to system failures, downtime costs are still a major factor to consider. Is there a way to predict how the system will work under future performance parameters and forecast any potential failures?
Instead of relying on the reactive strategies used in traditional monitoring and visualization methods, consider a predictive approach. By feeding data into a predictive model, you can simulate the relationship between multiple variables and forecast how the system will perform under a variety of conditions. This approach also allows your staff to take proactive action to manage future requirements for incoming events.
For business users, in the majority of such cases, data scientists will create a Health Index for an asset that will show the probability of asset or system health in the prediction interval. This creates an easy-to-understand metric that requires a sophisticated mathematical approach. For example, Linxia Liao from GE Digital’s Data Science Services team recently co-authored a paper for the International Journal of Prognostics and Health Management (of assets) in which he proposed a method to integrate feature extraction and prediction as a single optimization task by stacking a three-layer model as a deep learning structure.
The proposed method was tested on a small dataset collected from a fleet of mining haul trucks. The model resulted in the ``individualized'' failure probability representation for assessing the health condition of each individual asset, which well separates the in-service and failed trucks. The proposed method was also tested on a large open source hard drive dataset and it showed promising results.
A good example of the benefits received with the predictive approach is Intel, who saved $3 million using predictive analytics to prioritize silicon chips inspections in just one year alone.
Real-world equipment reliability can be significantly different from the stats provided by manufacturers. Is there a way to predict reliability using actual data?
By analyzing actual field data, machine learning and artificial intelligence can predict the likelihood of equipment failures and warn you ahead of time, regardless of what the manufacturer’s predicted failure rate may be. This allows your team to replace equipment and key system components before they fail, which reduces downtime and revenue loss due to system failure.
Here’s how you can get your company started in data science.
First, you need to choose the problems that you’ll most benefit from resolving. Building a data science solution will require investments of time and capital, so make sure first that the potential benefits outstand costs. Our Data Science Services team has developed a methodology delivered in a two-day workout to help our customers systemize problems or key performance indicators (KPIs) they’d like to solve or improve, and prioritize them.
Second, before investing fully investing in data science, I highly recommend assessing the quality of the data available to you in order to develop an understanding of the relationships between data sources. The goal of this step is to ensure that the quality of data is sufficient enough for analysis, and that the data has entitlement in regard to the problem you would like to solve. If there is no value in the data in regard to the KPIs you would like to predict, you’ll have to discuss the data gaps and instrument processes with data collection.
Once you’ve proven that data is good enough to use in modelling, you may continue with testing different machine learning algorithms or other modelling techniques to develop a solution. Your data science team should be designed to allow “citizen data scientists”—users with some domain knowledge and a surface-level understanding of machine learning algorithms—to apply different techniques and algorithms and evaluate the results. This will help your organization determine whether the system is truly working more efficiently and if the value justifies your investment in data science.
As is frequently the case in large industrial companies, different departments have access to different data resources with little cross visibility. This must be addressed early on, because data convergence is a critical component in using data to drive better business decisions. To that end, you need a platform that allows the analytical framework to easily access representative data samples from a wide variety of data sources along with random sampling, data aggregation, data cleansing, missing value protocols, data normalization, rescaling, and more.
GE has developed both an industrial applications platform, Predix, that has extensive data ingestion and analytic capabilities, as well as services, such as GE Digital’s Data Science Services, that you can leverage to speed up value extraction from the Industrial Internet.
With data science, industrial organizations can derive more value from data and use that intelligence and insight to drive better business outcomes and intelligent business decisions.
Editor’s note: The original idea and inspiration for this blog post was drawn from our fellow data scientist, Massoud Seifi.