Overview

About Predix Insights

Predix Insights is a big data processing and analytics service. It provides native Apache Spark support and orchestration features using Apache Airflow. Use it to build pipelines and run orchestrations to process analytic data in your runtime environment.

Predix Insights provides a managed infrastructure so you can concentrate on building your application. Use it to build pipelines to collect, store, process, and analyze large volumes of data without having to manage a distributed computing infrastructure yourself (such as Hadoop) on your own.

Embedded is an Apache Spark-based framework for writing spark applications in a declarative way and configuring the output source with minimal coding. The service also supports creating a multi-step orchestration where you can run more than one analytic as a single workflow, while resolving interdependencies.

Architecture

The following diagram shows the functional architecture of Predix Insights in a Spark runtime environment.

Predix Insights User Roles

The following user roles are supported. Access to Predix Insights functionality varies according to the user type you are provisioned for.

RoleDescriptionAccess to
AdminThe administrator has access to all Predix Insights functionality. The admin can create, edit, start, stop, restart, or kill a flow. Develop →
  • Flow Templates
  • Flows
  • Orchestration
  • Dependencies
Monitor →
  • Flows
  • Orchestration
OperatorThe operator has access to Predix Insights monitoring functionality. The operator can monitor a flow or orchestration.Monitor →
  • Flows
  • Orchestration

Concepts

The following is a list of common terms used in this document and their definition.

Dependencies
Common dependencies, such as libraries, are uploaded once and are stored in Predix Insights for reuse. After being uploaded, a dependent file is available from the central location on demand whenever called by a specific job.
Directed Acyclic Graph (DAG)
A DAG file contains the collection of tasks intended to be run as a single unit and defines their order of execution at runtime. They describe how you want the flow to be run by defining the correct order of tasks. A DAG is defined in standard Python files and can describe more than one task.
Flow Template
A flow template contains the analytic code, including the required configuration and library files, in ZIP format. Once configured, a single analytic (flow template) can be run against multiple assets. This allows you to upload the analytic code once and store for repeated use. The runtime configuration is separately defined in a flow. Separating the analytic code and the runtime configuration enables you to create multiple customized flows to run against a single analytic. For example, you might create an analytic for a gas turbine and then create separate flows for each turbine customization. If the analytic code changes, you only need to change only the flow template.
Flow
A flow file contains the configuration details that defines how data is to be processed at runtime. More than one flow file can be associated with the same analytic (flow template). An individual flow can be configured with runtime parameters such as a dataset location, ID, sensor ID, and whatever else is needed, so that the flow is specific to the given asset. A flow can be launched multiple times, as needed, or hourly, daily, etc.
Instance
An instance is an individual execution of a flow.
Operator
An operator describes a single task in a workflow. The execution order of an operator (tasks) is controlled by the DAG file. Supported Predix Insights operators are the PredixInsightsOperator maco, the PythonOperator function, and the BranchPythonOperator function.
Orchestration
An orchestration is a group of analytic flows to be run together as a single unit, the task order for execution is defined in the corresponding DAG files. You have the ability to configure, execute, validate, and monitor analytic execution.
Scheduler
The scheduler provides the ability to schedule the execution of analytics or orchestrations of analytics on time-based intervals, also called a job. You can create a job, retrieve both job definitions and history, update definitions, and delete jobs.
Task
A DAG file contains the collection of tasks intended to be run as a single unit. After an operator is instantiated, it is called a task. Each task within a DAG represents a node in the graph, which can be either a macro or a function and share a common set of parameters.
A task instance is a specific run of a task. A task instance has a state, such as RUNNING, SUCCESS, FAILED, and so on.