Both challenges and opportunities for machine learning (ML) techniques are emerging from big data and cloud computing.
In Chapter 12 of the Handbook in Computational Intelligence, Springer 2013 (to appear), I presented case studies—anomaly detection and optimizing the balance of an investment portfolio among others—whose applications were developed before the advent of cloud computing and big data.
Four challenges for CI and ML
Now, though, we must analyze the very large data sets presented by the Internet of Things (IoT), machine to machine (M2M) connectivity and social media and address the underlying “three v’s” in big data: volume, velocity and variability. The following diagram lays out four research challenges for CI/ML analysis going forward:
- Data-driven model automation and scalability
- ML/Human interactions
- Decision making and uncertainty
- Model ensemble/fusion
Each of these challenges is important. Each is multi-faceted and can occupy significant brain space. Today, I’d like to present the high points of model ensemble/fusion. Here, several elements come together:
- Integrating structured and unstructured data – In the simplest case, you must integrate time-dependent text, such as news, reports and logs, with time series data, such as text as a sensor. Instead of learning across multiple data formats, we can fuse the outputs of an ensemble of modality-specific learners for greater efficiency and speed.
- Integrating physics-based and data-driven models – Integration can be loose or tight. Loose integration, at its simplest, applies data-driven models to the delta between expected (physics-driven) values and actual (measured by sensors) values. In tight integration, data-driven models generate estimates of parameters and initial conditions of physics-based models.
- Model-agnostic fusion – This is typically deployed when predictive models are created by a variety of sources such as crowdsourcing via competition or cloud-based genetic programming. It’s a meta-model that draws on each predictive model’s meta-data to define applicability and relative level of performance.
- Model diversity by design – One of the most promising techniques in designing diverse model ensembles involves evolving a large population of symbolic regression models by distributing genetic algorithms using an island approach.
Our mission, as I see it, is to shape computational intelligence (CI) research to take advantage of crowdsourcing and the cloud’s almost infinite computational capacity to meet the challenges of working with big data.
Piero Bonissone is a Coolidge Fellow and a Chief Scientist at GE Global Research. A Fellow of AAAI, IEEE, and IFSA, he has published over 150 articles and holds 65 patents from the U.S. Patent Office (16+ are pending). He has won numerous awards in areas such as Soft Computing and Fuzzy Systems.