Applying the principles of data science
Here’s how to get started in data science.
First, you need to choose problems that have a high payoff if resolved. Building a data science solution will require investment of time and capital. So, make sure that the potential benefits outweigh the costs. GE Digital’s Data Science Services team has developed a two day workout to help systemize customer’s problems or key performance indicators (KPIs) and prioritize them accordingly.
Second, before fully investing in data science, we highly recommend assessing the quality of data available to you and developing an understanding of the relationships between data sources. The goal of this step is to ensure that the quality of data is adequate for analysis, and that the data has entitlement in regard to the problem you’re looking to solve. If there is no value in the data in relation to the KPIs you’d like to predict, then you’ll have to address the data gaps and instrument processes with additional data collection.
Once you’ve proven that the data is good enough to use for modeling, you may continue with testing different machine learning algorithms or other modeling techniques to develop a solution. Your data science team should consist of “citizen data scientists” (users with some domain knowledge and a surface-level understanding of machine learning algorithms) to apply different techniques and algorithms to evaluate the results. This will help your organization determine whether the system is truly working more efficiently and if the value justifies your investment in data science.
As is frequently the case in large industrial companies, different departments have access to different data resources with little cross visibility. This must be addressed early on, because data convergence is a critical component in using data to drive better business decisions. To that end, you need a platform that allows the analytical framework to easily access representative data samples from a wide variety of data sources along with random sampling, data aggregation, data cleansing, missing value protocols, data normalization, rescaling, and more.
GE has developed both an industrial applications platform, Predix, that has extensive data ingestion and analytic capabilities, as well as data science services to help you speed up extracting value from Industrial Internet.
With data science, industrial organizations can derive more value from existing data and use that intelligence and insights to drive better business outcomes and intelligent business decisions.
Editor’s note: The original idea and inspiration for this blog post was drawn from our fellow data scientist, Massoud Seifi.