Analytic Development

Analytic Development Process

The process to develop an analytic for the Predix platform is as follows.

Note: The maximum file size for upload is 250MB. The maximum expanded analytic file size (including directories and files) is 500MB.
TaskInformation
1.Develop the analytic in either Java, Matlab, or Python.See
2.Add the analytic to the Analytics Catalog for testing.See Adding and Validating an Analytic Using REST APIs
3.Once the analytic has been added to the catalog, test the analytic with additional input data sets.See About Running a Single Analytic
4.Deploy the analytic to production.See Deploying a Production Analytic to Cloud Foundry

About Machine Learning/Trained Analytics

Analytics that use a trained model (commonly known as trained analytics, machine trained analytics, or machine learned analytics) are supported. To use a machine trained analytic in an orchestration, you will follow the orchestration task roadmap corresponding to your business needs.

However, depending on how the trained model was deployed you may need to perform a couple of additional configuration steps.

Deployment Options for Trained Analytics

Analytics that use trained models are supported provided the following conditions are met.

  1. The training is performed independent of the Analytics Framework.
  2. The training is with respect to an asset context.
  3. The output of the training exercise is one of the following:
    1. An executable analytic that is only applicable to the training context.
    2. An executable kernel that is applicable to all trained models and a set of large trained models (as individual files of byte streams) for the training context.
    3. An executable kernel that is applicable to all trained models and a set of small trained models (as individual files of byte streams) for the training context.

The options for deploying a trained model are further described in the following table.

Deployment MethodSupported Training OutputProsCons
Embedded Trained Models
  • 3.a
  • 3.b
  • Can handle very large model files.
  • Better performance with many uses.
  • Reusable kernels are duplicated, therefore storage costs are higher.
  • Many models trained for different contexts will be shown as many different analytics.
    Tip: You should manage the instances by creating a taxonomy node for the analytic.
  • If the model changes after the analytic is deployed to production, the analytic version must be updated to update the model.
Trained Models as Runtime Configuration Items3.c
  • Reduced cost since the kernel can be reused in the Analytics Catalog.
  • Can update the model after it is in production use.
Performance. The model is loaded with each use of the analytic.

Embedded Trained Models

The analytic can be written such that it expects the trained model to be bundled in the executable. At runtime, the analytic can then retrieve the model from the local file system for the deployed analytic. For more information, see Adding and Validating an Analytic Using REST APIs.

Trained Models as Runtime Configuration Items

The analytic developer creates an entry point to the analytic that expects a map of trained model values, as well as the rest of the inputs to the analytic. For more information, see the analytic development topics in this section.

Java Analytic Development

Follow these guidelines when developing a Java analytic for the Analytics Framework.

  1. Create a Java project using JDK 1.7+.
  2. Implement an entry point that the Analytics Framework will use to run the analytic. The pattern required for this entry point depends on whether the analytic uses trained models, and if so, how the trained model is deployed (see About Machine Learning/Trained Analytics).
    • Analytics that do not use trained models, or have the trained models embedded in the analytic deploy.

      Implement as follows.

      public String entry_method(String inputJson)
      ParameterDescription
      entry_methodCan be any method name. Register this class and method name with the framework in the config.json file.
      inputJsonWill be the Unicode string containing the JSON structure defined in the corresponding Analytic Template.
      string_outputMust follow the JSON output structure defined in the corresponding Analytic Template.
    • Analytics that use trained models from the Runtime Configuration.

      Implement as follows.

      public String entry_method(String inputJson, Map<String, byte[]> inputModels)
      ParameterDescription
      entry_methodCan be any method name. Register this class and method name with the framework in the config.json file.
      inputJsonWill be the string containing the JSON structure defined in the corresponding Analytic Template.
      string_outputMust follow the JSON output structure defined in the corresponding Analytic Template.
      inputModelsWill contain the map of trained model's defined in the inputModel port definition in the corresponding Analytic Template. The keys will be the model names as defined in the template.
  3. Create a JSON configuration file that specifies the entry point's method and class, and place it in the src/main/resources directory within your Java project.
  4. Generate the Java JAR for your analytic.

For an example of a Java analytic application, see the demo-adder-java project in GitHub at https://github.com/PredixDev/predix-analytics-sample.

Java Analytic Configuration config.json

Note: The Predix Analytics Framework assumes that both the input to and the output from the Java analytic is String data type. No other data type as input or output is supported. However, you can internally parse and construct the Strings in JSON or XML to support another data type.

The following example shows sample values.

{
  "className": "com.ge.predix.analytics.demo.java.DemoAdderJavaEntryPoint",
  "methodName": "add2Numbers"
}
Table 1. Java Configuration config.json
PropertiesDescriptionExample
classNameFully qualified class name to execute the Java analytic<package>.DemoAdderJavaEntryPoint
methodNameThe method name to execute the Java analyticadd2Numbers
Note: <package> should be a fully qualified package name. For example, com.ge.predix.insight.analytic.demo.matlab.

Matlab Analytic Development

Follow these guidelines when developing a Matlab analytic for the Analytics Framework.

Note: Matlab analytics that use a trained model are not supported in this release.
  1. Implement your Matlab analytic so it takes data in and produces data out as JSON strings.
  2. Generate the Java JAR for the Matlab analytic using the instructions "Matlab Builder for Java", available at http://soliton.ae.gatech.edu/classes/ae6382/documents/matlab/mathworks/javabuilder.pdf.

    Make note of the package, class name, and method name definitions entered.

  3. Create a Java module that consumes your Matlab analytic as a library. For a Java module example, see the demo-adder-matlab-r2011b project.

    The pom.xml file should include the reference to your analytic as a dependency in the dependencies section. Any values can be used for the groupId, artifactId, and version properties, as long as the scope value is system, and the systemPath value is correct.

    Place the generated JAR file from step 2 in the src/main/resources directory of the Java module.

  4. Configure the Java module to consume it as a library, using the javabuilder.jar file corresponding to the analytic's Matlab version.
  5. Create a Java entry point class with a default constructor (a constructor with no input parameters).

    If your Matlab method does not accept a JSON string as input and produces a JSON string as output, the Java entry point method should call your Matlab method (with the correctly formatted parameters) and convert the output to a proper JSON string.

  6. Create the JSON configuration file (config.json) with className, MethodName, and matlabVersion definitions.

    They should instruct the generated wrapper code to call your designated entry point method with the request payload. Place the config.json file in the src/main/resources directory of your Java module.

  7. Create a JAR package out of the Java module. If using Maven, run the command mvn clean package.

For an example of a Matlab analytic application, see the demo-adder-matlab-r2011b project.

Matlab Analytic Configuration config.json

Note: The framework assumes the input to the Matlab analytic is string data type and the output from the Matlab analytic is also string type. It doesn't support any other data type as input or output.

The following example shows sample values.

{
  "className": "com.ge.predix.analytics.demo.matlab.DemoMatlabAdderEntryPoint",
  "methodName": "add2Numbers",
  "matlabVersion": "r2011b"
}
Table 2. Properties
PropertiesDescriptionExample
classNameFully qualified Java class name to execute the Matlab analytic.<package>.DemoMatlabAdderEntryPoint
matlabVersion The Matlab released version number. In this release the supported release versions are r2011b, r2012a. Please note the "r" in the version. r2011b
methodNameThe method name to execute the Matlab analytic.add2Numbers
Note: <package> should be a fully qualified package name. For example, com.ge.predix.insight.analytic.demo.matlab.

Dependent Java and Matlab Libraries

A good practice is to add only the required JAR files, including the javabuilder.jar file, as part of the analytics package. Do not add the following third party libraries as they are already provided by the Predix Analytics Framework. If you need to use any of these libraries, use the exact version listed. Failure to do so may cause the Java or Matlab analytic to fail at runtime.

  • annotations-2.0.1.jar
  • aopalliance-1.0.jar
  • asm-3.1.jar
  • bcpkix-jdk15on-1.47.jar
  • bcprov-jdk15on-1.52.jar
  • classmate-1.0.0.jar
  • commons-beanutils-1.8.3.jar
  • commons-codec-1.6.jar
  • commons-configuration-1.8.jar
  • commons-io-1.4.jar
  • commons-lang-2.6.jar
  • commons-lang3-3.1.jar
  • commons-logging-1.2.jar
  • cxf-api-2.7.3.jar
  • cxf-rt-bindings-xml-2.7.3.jar
  • cxf-rt-core-2.7.3.jar
  • cxf-rt-frontend-jaxrs-2.7.3.jar
  • cxf-rt-transports-http-2.7.3.jar
  • dozer-5.4.0.jar
  • geronimo-javamail_1.4_spec-1.7.1.jar
  • groovy-2.3.7.jar
  • groovy-json-2.3.7.jar
  • groovy-xml-2.3.7.jar
  • guava-13.0.1.jar
  • hamcrest-core-1.3.jar
  • hamcrest-library-1.3.jar
  • hibernate-validator-5.1.3.Final.jar
  • httpclient-4.3.6.jar
  • httpcore-4.3.jar
  • httpmime-4.3.6.jar
  • jackson-annotations-2.4.0.jar
  • jackson-core-2.4.4.jar
  • jackson-core-asl-1.8.9.jar
  • jackson-databind-2.4.4.jar
  • jackson-jaxrs-1.8.9.jar
  • jackson-jaxrs-base-2.4.1.jar
  • jackson-jaxrs-json-provider-2.4.1.jar
  • jackson-mapper-asl-1.8.9.jar
  • jackson-module-jaxb-annotations-2.4.1.jar
  • jackson-module-jsonSchema-2.4.1.jar
  • jackson-module-scala_2.10-2.4.1.jar
  • javassist-3.18.2-GA.jar
  • javax.ws.rs-api-2.0-m10.jar
  • jaxb-impl-2.2.6.jar
  • jaxb2-basics-0.6.4.jar
  • jaxb2-basics-runtime-0.6.4.jar
  • jaxb2-basics-tools-0.6.4.jar
  • jboss-logging-3.1.3.GA.jar
  • jcl-over-slf4j-1.7.8.jar
  • jersey-core-1.13.jar
  • jersey-multipart-1.13.jar
  • jersey-server-1.13.jar
  • jersey-servlet-1.13.jar
  • joda-convert-1.6.jar
  • joda-time-2.8.jar
  • json-20140107.jar
  • json-path-2.4.0.jar
  • json4s-ast_2.10-3.2.11.jar
  • json4s-core_2.10-3.2.11.jar
  • json4s-ext_2.10-3.2.11.jar
  • json4s-jackson_2.10-3.2.11.jar
  • json4s-native_2.10-3.2.11.jar
  • jsr305-2.0.1.jar
  • jsr311-api-1.1.1.jar
  • jul-to-slf4j-1.7.8.jar
  • log4j-over-slf4j-1.7.8.jar
  • logback-classic-1.1.2.jar
  • logback-core-1.1.2.jar
  • mapstruct-1.0.0.Beta4.jar
  • mimepull-1.6.jar
  • objenesis-2.1.jar
  • paranamer-2.6.jar
  • reflections-0.9.9.jar
  • rest-assured-2.4.0.jar
  • rest-assured-common-2.4.0.jar
  • scala-compiler-2.10.0.jar
  • scala-library-2.10.4.jar
  • scala-reflect-2.10.4.jar
  • scalap-2.10.0.jar
  • slf4j-api-1.6.4.jar
  • snakeyaml-1.5.jar
  • spring-aop-4.1.4.RELEASE.jar
  • spring-beans-4.1.4.RELEASE.jar
  • spring-boot-1.2.1.RELEASE.jar
  • spring-boot-autoconfigure-1.2.1.RELEASE.jar
  • spring-boot-starter-1.2.1.RELEASE.jar
  • spring-boot-starter-logging-1.2.1.RELEASE.jar
  • spring-boot-starter-security-1.2.1.RELEASE.jar
  • spring-boot-starter-tomcat-1.2.1.RELEASE.jar
  • spring-context-4.1.4.RELEASE.jar
  • spring-core-4.1.4.RELEASE.jar
  • spring-expression-4.1.4.RELEASE.jar
  • spring-hateoas-0.17.0.RELEASE.jar
  • spring-plugin-core-1.2.0.RELEASE.jar
  • spring-plugin-metadata-1.2.0.RELEASE.jar
  • spring-security-config-3.2.5.RELEASE.jar
  • spring-security-core-3.2.5.RELEASE.jar
  • spring-security-jwt-1.0.2.RELEASE.jar
  • spring-security-oauth2-2.0.3.RELEASE.jar
  • spring-security-web-3.2.5.RELEASE.jar
  • spring-web-4.1.4.RELEASE.jar
  • spring-webmvc-4.1.4.RELEASE.jar
  • springfox-core-2.0.2.jar
  • springfox-schema-2.0.2.jar
  • springfox-spi-2.0.2.jar
  • springfox-spring-web-2.0.2.jar
  • springfox-swagger-common-2.0.2.jar
  • springfox-swagger2-2.0.2.jar
  • stax2-api-3.1.1.jar
  • swagger-annotations-1.3.12.jar
  • swagger-annotations-1.5.0.jar
  • swagger-core_2.10-1.3.12.jar
  • swagger-jaxrs_2.10-1.3.12.jar
  • swagger-jersey-jaxrs_2.10-1.3.12.jar
  • swagger-models-1.5.0.jar
  • tagsoup-1.2.1.jar
  • tomcat-embed-core-8.0.15.jar
  • tomcat-embed-el-8.0.15.jar
  • tomcat-embed-logging-juli-8.0.15.jar
  • tomcat-embed-websocket-8.0.15.jar
  • validation-api-1.1.0.Final.jar
  • woodstox-core-asl-4.1.4.jar
  • wsdl4j-1.6.2.jar
  • xml-path-2.4.0.jar
  • xmlschema-core-2.0.3.jar

Python Analytic Development

You must follow these guidelines when developing a Python analytic for the Analytics Framework.
Important: Python versions older than V3.X are not supported for developing Python analytics for the Analytics Framework.
  1. Implement the analytic according to your development guidelines. In this discussion, the top level of the analytic implementation project structure is referred to as analytic_directory.
  2. Implement an entry point that the Analytics Framework will use to run the analytic. The pattern required for this entry point depends on whether the analytic uses trained models, and if so, how the trained model is deployed (see About Machine Learning/Trained Analytics).
    • Analytics that do not use trained models, or have the trained models embedded in the analytic artifact.
      Implement as follows.
      def entry_method(self, inputJson):
      • entry-method can be any method name. Register the file and method name with the framework in the config.json file.
      • inputJson is the Unicode string containing the JSON structure defined in the corresponding Analytic Template. The Unicode string_output must follow the output JSON structure defined in the Analytic Template.
      • string_output returns output that corresponds to the output JSON structure defined in the Analytic Template.
    • Analytics that use trained models from the Runtime Configuration.
      Implement as follows.
      def entry_method(self, inputJson, inputModels):
      • entry-method can be any method name. Register the file and method name with the framework in the config.json file.
      • inputJson is the Unicode string containing the JSON structure defined in the corresponding Analytic Template. The Unicode string_output must follow the output JSON structure defined in the Analytic Template.
      • string_output returns output that corresponds to the output JSON structure defined in the Analytic Template.
      • inputModels contains the dict() of trained models defined in the inputModel port definition in the Analytic Template. The keys will be the model names defined in the template.
  3. Create a config.json file in the top level of the analytic_directory. Complete the config.json entries as described in config.json Definitions table, below .
  4. Package all the analytic files in the analytic_directory (including config.json and the entry-file) into a ZIP file.

When developing the analytic driver, refer to the example code from demo-adder-py for a simple Python analytic. The demo-adder-py sample is available in GitHub at https://github.com/PredixDev/predix-analytics-sample.

Example: Python Analytic Configuration config.json File

The config.json file must follow the JSON structure as shown below.

{
  "entry-method": "<entry-directory>.<entry-class>.<entry-method>",
  "non-conda-libs": [
    "boto==2.23.0"
  ],
  "conda-libs": [
    "numpy==1.17.0",
    "scipy==1.3.1"
  ]
} 
  • You must list the Python packages with a version number using the following format:
    numpy==1.7.0
  • To determine where to define a library (conda or non-conda) in the config.json file, refer to the Anaconda distributed package list at https://anaconda.org/anaconda/repo. Define a package name included in this list in the conda-libs section.

The python build pack installs the default libraries specified as well as any dependent libraries. The combined size of all the python libraries can be very large and create disk space issues, causing analytic deployment to fail. For example, the dependent MKL library is also pulled in. The MKL library in particular is very large.

If you experience disk space issues when deploying your python analytic, you can add the Nomkl library in the "conda-libs" section of the config.json file. Adding "nomkl" to the first position in the list will disable the automatic download of the MKL library and replace it with non-MKL versions. This substitution will save on disk space. Note that the MKL features will be turned off when using "nomkl".

The following is an example of how to specify "nomkl" in the config.json file.

{
  "entry-method": "analytics.AnalyticDriverClass.run_analytic",
  "non-conda-libs": [
    "boto==2.23.0",
    "uncertainties==2.4.8",
    "pint==0.7.2",
    "scikit-learn==0.19.2",
    "scikit-image==0.14.1"
  ],
  "conda-libs": [
    "nomkl",
    "numpy==1.7.0",
    "pandas==0.24.1",
    "scipy==1.3.1"
  ]
}
Table 3. config.json Definitions
PropertyDescription
entry-directoryThe directory (under analytic_directory) containing the entry class.
entry-classThe Python file contained the entry_method(…) function.
entry-methodName of the entry_method.
non-conda python libraries(Optional) List of non-conda Python libraries. e.g.: ["boto==2.23.0"]
conda python libraries(Optional) List of conda Python libraries. e.g.: ["numpy==1.17.0","scipy==1.3.1"]

Dependent Python Libraries

The Python analytic developed using Python 3.6.5 is installed with the following set of default libraries:
  • Flask-1.0.2
  • jinja2-2.10
  • MarkupSafe-1.0
  • Werkzeug-0.14.1
  • amqp-2.3.2
  • certifi-2018.8.24
  • chardet-3.0.4
  • click-6.7
  • docopt-0.6.2
  • ecdsa-0.13
  • future-0.16.0
  • idna-2.7
  • itsdangerous-0.24
  • kombu-4.2.1
  • pika-0.12.0
  • pyasn1-0.4.4
  • pycrypto-2.6.1
  • python-jose-3.0.1
  • redis-2.10.6
  • requests-2.19.1
  • requests-toolbelt-0.8.0
  • rsa-4.0
  • six-1.11.0
  • stomp.py-6.1.0
  • urllib3-1.23
  • vine-1.1.4

It is likely that an individual Python analytic needs additional Python libraries such as numpy, scipy, pandas, or scikit-learn, and so on. To include additional libraries, specify them in libs and conda-libs fields in a config.json file. For an example, see Python Analytic Development.

A good practice is to add only the libraries required by the Python analytic as there is a cost to download and install each library and dependencies.