Blog
In my last post, I discussed the role of the data science practice in an enterprise and introduced the idea of building data products. Now, I want to define an industrial data product, provide reasons for building industrial data products, and share key insights for developing industrial data products.
Data scientists use data to generate insights. So, in its simplest definition, a data product incorporates the process of generating insights into a product. In other words, data products put data science processes in production to transform data and continuously create value. The key is to understand how typical enterprise software products are made.
Enterprise software products are typically developed based on business process alone. Data products, on the other hand, are driven at its core by data generated by a process, users, or machines. And industrial data products are data products developed for the industrial domain. This distinction is important because consumer companies are drastically different than industrial companies due to many factors—such as data quality, data access rights, adoption audience, cost of false positives, type of domain knowledge, and the product team composition. I’ll cover some of these differences further in the post. In general, industrial data product development requires change in organizational thinking, change in design thinking, and the ability to engineer products focusing on features that are data and user-centric. Therefore, building industrial data products are not just hard, but can become frustrating if your industrial organization does not have the right people and culture in place to understand how to build these products.
Key lessons to keep in mind when you aim to build a data product include:
Since data is at the core, you have to know how to collect the right data and process it to make it useful. In order to get this right, you will need to perform some (often many) steps to understand, clean, and transform data, which is typically the case with any data science process. The difference now is that you are incorporating this process in production to feed the most useful data to the product features by using the right “plumbing.” Before you start this, you have to acknowledge that data is messy. The quality of data from both the industrial assets and processes, such as manufacturing processes or services operations, are going to be messy to begin with. This is the first and most important problem that you need to tackle. It will take significant time (more than 70%) to make the data in a usable form and then build the pipelines in order to feed the data to any dashboard or analytics you want to develop.
In any data science activity, a typical outcome is a collection of insights that is useful to the business or to a set of stakeholders. When you build data products, you are essentially putting the process of generating those insights into production. For any offline data science activity, a team of data scientists typically explain how they generated those insights or predictions through a combination of various statistical and/or machine learning techniques. But in a data product, when the insight generation and visualization process is automated, a lot of onus is on the consumer of that insight to interpret and understand the reasoning behind it. This is a big challenge we face when building industrial data products. For most industrial customers, the ability to use data and analytics to make decisions hinges significantly on their comfort and ability to believe the insight or prediction the system is generating. For example, if we developed machine-learning models to predict machine failure or forecast performance degradation, we’d often be asked to explain how the algorithms generated those predictions. Many of the machine learning and deep learning techniques data scientists often use are black box techniques, which can become challenging to explain how and why the models are producing the outputs. So, while developing industrial data products, the adoption and usage efficiency greatly relies on how easily explainable and user-centric the insights are. This, in turn, creates a requirement to balance between accuracy (eg., advanced black-box models) and interpretability (eg., straight-forward linear models).
Even though, conventionally, it is expected for industrial products to be complex, simplification should be your best friend if you want to build successful industrial data products. There are two main aspects of simplification. First, is about starting with simple industrial data products. Don’t aim to start building a super complex predictive maintenance solution with the most advanced machine learning algorithms. For example, while building an IIoT solution for a customer, a requirement was to develop predictive algorithms to detect equipment failures from sensor data. Before we could do that, we developed simple analytics to clean sensor data, align it, and then count unique events. Building a simple counter of events from a handful of sensor streams took a good deal of iterative process and collaboration across teams. The customer may easily accept the iterative process, but to truly adopt it, try as many things as possible early, and still make progress is still hard. The key is to realize that it takes time to make a data product mature. You go from using “version A”, getting feedback from “version A”, generating more data from “version A” to producing a better “version B.” As long as you are doing this in production, you will find the gaps in your pipeline, you will plug those gaps, and make a better “version C.” It’s a cycle—rinse and repeat.
It is not a surprise that the success of any product development process relies heavily on the right product team. The right mix of people, skillset, and above all, a right mindset makes all the difference. Data scientists are often considered to work disjoint from product or engineering teams to perform all tasks from EDA to model development to validation. It is often expected of data scientists to simply ‘hand over’ the models and relevant codebase to the engineering teams. There has been growing recognition that data scientists should be a more integral part of product and engineering teams. Typically, data scientists are interacting with the customers while also rapidly iterating on experiments with data—making them immensely useful to guide product development. If you are providing data science services to your customers, it is exponentially beneficial to keep iterating the integrated end solution that will be deployed and drive analytics alongside solution development with a cross-functional team with data scientists and engineers working together. There is another key difference from the perspective of domain knowledge while developing industrial data products. The requirement and necessity of having domain knowledge with either the data scientist and/or the product manager is much higher. It is absolutely critical to have the right people with required domain knowledge for the success of an industrial data product.
I’ll leave the readers with the two most important qualities required in a team responsible for building data products. First is the ability to deal with ambiguity. If you go in trying to perfect your understanding of the product, what the customer wants before you start, you will fail. This is not to be confused with setting the initial vision of the product. Secondly, the team should be comfortable experimenting and iterating as much as required. With every experiment, measure critically and adapt not just your data product but also how your team should be organized.