While the Industrial Internet is still growing, the big data technologies it relies on are already dominating many industries. Batch-processing analytics platforms like Hadoop are behemoths, having been built out to handle over a hundred petabytes of data, deal with failures, and scale globally.

But for the Industrial Internet to get the most out of data, analytics need to be more than just big, they need to be fast.

Picture a smart hospital where monitoring systems analyze anonymous triage data and detect a potential disease outbreak as it's occurring. Early quarantine procedures could go into effect to contain the disease and prevent many more patients from becoming infected. Or take an electric grid filled with smart assets that could adjust immediately to a power anomaly and adjust their behavior to prevent a failure. Self-correcting smart machines will require fast and accurate information to act in time.

In order to achieve these kinds of rapid responses, data analysis needs to occur almost in tandem with data collection. Real-time analytics platforms like Storm work by live-capturing streaming data in non-relational, NoSQL databases and processing them in fault-tolerant distributed database systems. The resulting outputs are then stored in structured formats like JSON and XML and interpreted and presented to the end-user by data visualization systems.

For Industrial Internet applications, the recipe for success will involve combining both big data processing along with real-time processing in a way that, for the end user, is seamless. Yahoo! and other companies are already finding ways to integrate Hadoop and Storm to provide robust data analysis for any given situation. Industrial Internet solutions can benefit greatly from these efforts because the kinds of domain problems the Industrial Internet hopes to solve are wide and varied, covering all scales from the largest to the smallest, and the instantaneous to the long-term.

About the author

Jan Helbing

Marketing Communications Lead at GE Software