Select Country
Follow Us

Trawling for Big Insight in the ‘Industrial Data Lake’

You’d need to hide yourself under a pretty large rock to avoid hearing about Big Data these days. From NASA to Netflix, organizations of all sorts and sizes are taking advantage of larger-than-life data sets to power everything from lunar modeling to color-pattern analysis. But many companies remain behind the curve — not due to a lack of data, but the absence of an efficient system to gather and analyze it all.


That could be changing soon. A new system developed by GE and Pivotal aims to empower more companies to leverage the analytical tools of Big Data and the Industrial Internet.

“Big and fast data is a critical piece of how modern industry is reinventing itself in order to innovate and compete,” says Pivotal CEO Paul Maritz.

The new approach is called an “industrial data lake.” It leverages GE’s Predix™ industrial software platform  and Apache Hadoop, an open-source software framework, to process information 2,000 times faster and at one-tenth the cost of previous methods.

“Big Data is growing so fast that it is outpacing the ability of current tools to take full advantage of it,” says Bill Ruh, vice president of GE Software, explaining how the industrial data lake is able to merge IT with operational technology (OT) to enable companies to “get the most value out of their mission-critical information.”

The marriage of IT and OT will be needed as companies seek greater online control of their machines and other operational devices, with analyst firm IDC estimating that at least three-quarters of intelligent industry solutions will be data intensive. “As a result, effective industrial data management is moving to the forefront of business looking to ‘digitalize’ their operations,” says Lothar Schubert, platform product marketing leader at GE Software.

GE data lake train_small loop

The Lake Effect

So how does a data lake work? The essential mechanics of the system boil down like this: Predix connects together a broad range of industrial equipment in a network — the Industrial Internet — enabling operators to funnel sensor data from various networked machines onto a single platform. From there, Hadoop’s massively parallel processing architecture analyzes the data as a unified whole, rather than a billion separate bits of information, each with its own individual file path.

“Instead of slicing, dicing, and classifying the data, we capture the metadata, which is data about the data,” says Dave Bartlett, computer scientist and chief technology officer for GE Aviation. “Metadata provides a more robust and varied context at the time of analysis that’s been missing from conventional data storage.”

So far, the industrial data lake approach has been used to great effect by GE Aviation, in a trial that tracked some 15,000 flights across 25 airlines. Service crews were better able to analyze jet-engine temperatures and other performance factors, cutting costs by tenfold.

Such data-crunching capabilities can be applied to industries ranging from rail to healthcare, where small but powerful insights can equate to huge cost savings. “What most of our customers will tell you is, if you can help them unlock one more mile-per-hour for a locomotive running its daily routes, it’s worth up to $200 million a year,” says Vince Campisi, chief information officer for GE Software.

This in many ways reflects the promise of Big Data — not simply a massive database of disparate information, but a system that correlates and cross-references at the speed of commerce, drawing new insight from the aggregate.

Armed with an effective analytics framework, industrial users can move away from wholesale data warehousing to a more dynamic model where information storage is so efficient that it begins to change the business model.

For companies that haven’t yet embraced Big Data, it may be time to dip their toes in the lake.

Subscribe to our GE Brief