It Starts with Covering Your Bases with the Industrial Data Lake

It’s the top of the ninth, the bases are loaded, the next batter is up at bat, and the pitcher winds up to throw a fast one. To hit the ball, the batter will need to take in sensory input through his or her eyes, analyze the trajectory of the ball in the brain, and trigger the swing of the bat at a precise time and angle.

Hitting a home run with the Industrial Internet is not too different. Machines take input from sensors, send the data to a platform that uses models to analyze it, and then provide actionable insights that drive real business outcomes. In this way, industrial data platforms are like “digital brains”, playing a similar role to the human brain. The Industrial Data Lake is bringing this concept to life and revolutionizing how this can happen in a secured way by:

  • enabling the management of any data in one place with appropriate governance
  • providing high performance computing capability for real-time, interactive and batch analytics
  • simplifying the visualization layer to rapidly analyze insights, and
  • facilitating monetization capabilities for outcomes.

At this month’s IoT/Industrial Internet Bay Area Meetup, Pivotal Head of Data Sciences for Americas, Kaushik K. Das spoke to GE’s vision and laid out the four phases, or “bases”, industrial companies need to cover to leverage the processing power of the Industrial Data Lake to increase productivity and reduce unplanned maintenance.

First Base: Problem Formulation

To reach the first base, you’re going to have to know where you’re going. Start by understanding the goals and pain points of key stakeholders. This could be related to anomalies in machine behavior, resource management, theft prevention, or demand prediction. For example, distributed power companies can increase revenue by taking advantage of capacity markets and real-time price spikes, so they need to know when they occur and be able to shift energy sources to maximize efficiency. Figure out what matters most to your business, like when your customers make a purchasing decision, by collaborating with domain experts in your business and use that information to formulate a problem to solve with data.

Second Base: Data Assessment

Now you need to gather the relevant data available to you. Historically, this has been a challenge with data siloed across the organization, not to mention around the world. Identify the information you need from inside the organization and available outside the organization, and build the right feature set to make full use of the volume, variety, and velocity of all available data. The Industrial Data Lake enables the processing of the massive volumes of data generated by industrial assets by moving the computation to the data. A single flight, for example, generates a terabyte of data and there are thousands of flights each day. This computational power opens up the possibility of making use of larger data sets and using data like never before.

Third Base: Model Construction

After addressing the what, where and when, move on to considering the why and what if to shape the model you develop to make sense of the data. The Industrial Data Lake built on massively scalable low cost infrastructure using COTS (Commercial Off-The-Shelf) hardware with underlying Hadoop file system (HDFS), provides high-performance analysis using massively parallel processing architecture to analyze data across real-time, interactive and batch mode. For example, an oil and gas company could model the risk of individual pipe segments thousands of miles of pipe to determine which segments should be replaced.

Home plate: Application

Put points on the board by turning data into insights that can be used by the business to change outcomes. Create a framework for integrating the model with decision-making processes to take action using the Industrial Internet. This could involve building dashboard to navigate clusters, applying models to different use cases to detect opportunities, and identifying anomalies. For a rail company, applying advanced predictive analytics to a whole network could translate into fuel consumption reductions likely yielding big costs savings to the business.

The potential outcomes are real. GE recently helped an oil and gas customer identify the need to replace a seal on an oil rig off the coast of Scotland, saving an estimated $7.5 million in losses that would have occurred due to unplanned downtime. As GE Chairman and CEO Jeff Immelt has said, “Zero unplanned downtime is a key goal for GE’s use of the Industrial Internet.” As costs continue decreasing in storage and computing, the stage is set to not just hit a home run, but deliver a grand slam.

About the author

Kathryn Kilner

GE Digital

Related insights