Analytic Development
Analytic Development Process
The process to develop an analytic for the Predix platform is as follows.
Task | Information | |
---|---|---|
1. | Develop the analytic in either Java, Matlab, or Python. | See |
2. | Add the analytic to the Analytics Catalog for testing. | See Adding and Validating an Analytic Using REST APIs |
3. | Once the analytic has been added to the catalog, test the analytic with additional input data sets. | See About Running a Single Analytic |
4. | Deploy the analytic to production. | See Deploying a Production Analytic to Cloud Foundry |
About Machine Learning/Trained Analytics
Analytics that use a trained model (commonly known as trained analytics, machine trained analytics, or machine learned analytics) are supported. To use a machine trained analytic in an orchestration, you will follow the orchestration task roadmap corresponding to your business needs.
However, depending on how the trained model was deployed you may need to perform a couple of additional configuration steps.
Deployment Options for Trained Analytics
Analytics that use trained models are supported provided the following conditions are met.
- The training is performed independent of the Analytics Framework.
- The training is with respect to an asset context.
- The output of the training exercise is one of the following:
- An executable analytic that is only applicable to the training context.
- An executable kernel that is applicable to all trained models and a set of large trained models (as individual files of byte streams) for the training context.
- An executable kernel that is applicable to all trained models and a set of small trained models (as individual files of byte streams) for the training context.
The options for deploying a trained model are further described in the following table.
Deployment Method | Supported Training Output | Pros | Cons |
---|---|---|---|
Embedded Trained Models |
|
|
|
Trained Models as Runtime Configuration Items | 3.c |
| Performance. The model is loaded with each use of the analytic. |
Embedded Trained Models
The analytic can be written such that it expects the trained model to be bundled in the executable. At runtime, the analytic can then retrieve the model from the local file system for the deployed analytic. For more information, see Adding and Validating an Analytic Using REST APIs.
Trained Models as Runtime Configuration Items
The analytic developer creates an entry point to the analytic that expects a map of trained model values, as well as the rest of the inputs to the analytic. For more information, see the analytic development topics in this section.
Java Analytic Development
Follow these guidelines when developing a Java analytic for the Analytics Framework.
- Create a Java project using JDK 1.7+.
- Implement an entry point that the Analytics Framework will use to run the analytic. The pattern required for this entry point depends on whether the analytic uses trained models, and if so, how the trained model is deployed (see About Machine Learning/Trained Analytics).
- Analytics that do not use trained models, or have the trained models embedded in the analytic deploy.
Implement as follows.
public String entry_method(String inputJson)
Parameter Description entry_method
Can be any method name. Register this class and method name with the framework in the config.json file. inputJson
Will be the Unicode string containing the JSON structure defined in the corresponding Analytic Template. string_output
Must follow the JSON output structure defined in the corresponding Analytic Template. - Analytics that use trained models from the Runtime Configuration.
Implement as follows.
public String entry_method(String inputJson, Map<String, byte[]> inputModels)
Parameter Description entry_method
Can be any method name. Register this class and method name with the framework in the config.json file. inputJson
Will be the string containing the JSON structure defined in the corresponding Analytic Template. string_output
Must follow the JSON output structure defined in the corresponding Analytic Template. inputModels
Will contain the map of trained model's defined in the inputModel
port definition in the corresponding Analytic Template. The keys will be the model names as defined in the template.
- Analytics that do not use trained models, or have the trained models embedded in the analytic deploy.
- Create a JSON configuration file that specifies the entry point's method and class, and place it in the src/main/resources directory within your Java project.
- Generate the Java JAR for your analytic.
For an example of a Java analytic application, see the demo-adder-java
project in GitHub at https://github.com/PredixDev/predix-analytics-sample.
Java Analytic Configuration config.json
The following example shows sample values.
{
"className": "com.ge.predix.analytics.demo.java.DemoAdderJavaEntryPoint",
"methodName": "add2Numbers"
}
Properties | Description | Example |
---|---|---|
className | Fully qualified class name to execute the Java analytic | <package>.DemoAdderJavaEntryPoint |
methodName | The method name to execute the Java analytic | add2Numbers |
package
> should be a fully qualified package name. For example, com.ge.predix.insight.analytic.demo.matlab
.Matlab Analytic Development
Follow these guidelines when developing a Matlab analytic for the Analytics Framework.
- Implement your Matlab analytic so it takes data in and produces data out as JSON strings.
- Generate the Java JAR for the Matlab analytic using the instructions "Matlab Builder for Java", available at http://soliton.ae.gatech.edu/classes/ae6382/documents/matlab/mathworks/javabuilder.pdf.
Make note of the
package
,class name
, andmethod name
definitions entered. - Create a Java module that consumes your Matlab analytic as a library. For a Java module example, see the demo-adder-matlab-r2011b project.
The pom.xml file should include the reference to your analytic as a dependency in the
dependencies
section. Any values can be used for thegroupId
,artifactId
, andversion
properties, as long as thescope
value issystem
, and thesystemPath
value is correct.Place the generated JAR file from step 2 in the src/main/resources directory of the Java module.
- Configure the Java module to consume it as a library, using the javabuilder.jar file corresponding to the analytic's Matlab version.
- Create a Java entry point class with a default constructor (a constructor with no input parameters).
If your Matlab method does not accept a JSON string as input and produces a JSON string as output, the Java entry point method should call your Matlab method (with the correctly formatted parameters) and convert the output to a proper JSON string.
- Create the JSON configuration file (config.json) with
className
,MethodName
, andmatlabVersion
definitions.They should instruct the generated wrapper code to call your designated entry point method with the request payload. Place the config.json file in the src/main/resources directory of your Java module.
- Create a JAR package out of the Java module. If using Maven, run the command
mvn clean package
.
For an example of a Matlab analytic application, see the demo-adder-matlab-r2011b project.
Matlab Analytic Configuration config.json
The following example shows sample values.
{
"className": "com.ge.predix.analytics.demo.matlab.DemoMatlabAdderEntryPoint",
"methodName": "add2Numbers",
"matlabVersion": "r2011b"
}
Properties | Description | Example |
---|---|---|
className | Fully qualified Java class name to execute the Matlab analytic. | <package>.DemoMatlabAdderEntryPoint |
matlabVersion | The Matlab released version number. In this release the supported release versions are r2011b , r2012a . Please note the "r" in the version. | r2011b |
methodName | The method name to execute the Matlab analytic. | add2Numbers |
package
> should be a fully qualified package name. For example, com.ge.predix.insight.analytic.demo.matlab
.Dependent Java and Matlab Libraries
A good practice is to add only the required JAR files, including the javabuilder.jar file, as part of the analytics package. Do not add the following third party libraries as they are already provided by the Predix Analytics Framework. If you need to use any of these libraries, use the exact version listed. Failure to do so may cause the Java or Matlab analytic to fail at runtime.
annotations-2.0.1.jar
aopalliance-1.0.jar
asm-3.1.jar
bcpkix-jdk15on-1.47.jar
bcprov-jdk15on-1.52.jar
classmate-1.0.0.jar
commons-beanutils-1.8.3.jar
commons-codec-1.6.jar
commons-configuration-1.8.jar
commons-io-1.4.jar
commons-lang-2.6.jar
commons-lang3-3.1.jar
commons-logging-1.2.jar
cxf-api-2.7.3.jar
cxf-rt-bindings-xml-2.7.3.jar
cxf-rt-core-2.7.3.jar
cxf-rt-frontend-jaxrs-2.7.3.jar
cxf-rt-transports-http-2.7.3.jar
dozer-5.4.0.jar
geronimo-javamail_1.4_spec-1.7.1.jar
groovy-2.3.7.jar
groovy-json-2.3.7.jar
groovy-xml-2.3.7.jar
guava-13.0.1.jar
hamcrest-core-1.3.jar
hamcrest-library-1.3.jar
hibernate-validator-5.1.3.Final.jar
httpclient-4.3.6.jar
httpcore-4.3.jar
httpmime-4.3.6.jar
jackson-annotations-2.4.0.jar
jackson-core-2.4.4.jar
jackson-core-asl-1.8.9.jar
jackson-databind-2.4.4.jar
jackson-jaxrs-1.8.9.jar
jackson-jaxrs-base-2.4.1.jar
jackson-jaxrs-json-provider-2.4.1.jar
jackson-mapper-asl-1.8.9.jar
jackson-module-jaxb-annotations-2.4.1.jar
jackson-module-jsonSchema-2.4.1.jar
jackson-module-scala_2.10-2.4.1.jar
javassist-3.18.2-GA.jar
javax.ws.rs-api-2.0-m10.jar
jaxb-impl-2.2.6.jar
jaxb2-basics-0.6.4.jar
jaxb2-basics-runtime-0.6.4.jar
jaxb2-basics-tools-0.6.4.jar
jboss-logging-3.1.3.GA.jar
jcl-over-slf4j-1.7.8.jar
jersey-core-1.13.jar
jersey-multipart-1.13.jar
jersey-server-1.13.jar
jersey-servlet-1.13.jar
joda-convert-1.6.jar
joda-time-2.8.jar
json-20140107.jar
json-path-2.4.0.jar
json4s-ast_2.10-3.2.11.jar
json4s-core_2.10-3.2.11.jar
json4s-ext_2.10-3.2.11.jar
json4s-jackson_2.10-3.2.11.jar
json4s-native_2.10-3.2.11.jar
jsr305-2.0.1.jar
jsr311-api-1.1.1.jar
jul-to-slf4j-1.7.8.jar
log4j-over-slf4j-1.7.8.jar
logback-classic-1.1.2.jar
logback-core-1.1.2.jar
mapstruct-1.0.0.Beta4.jar
mimepull-1.6.jar
objenesis-2.1.jar
paranamer-2.6.jar
reflections-0.9.9.jar
rest-assured-2.4.0.jar
rest-assured-common-2.4.0.jar
scala-compiler-2.10.0.jar
scala-library-2.10.4.jar
scala-reflect-2.10.4.jar
scalap-2.10.0.jar
slf4j-api-1.6.4.jar
snakeyaml-1.5.jar
spring-aop-4.1.4.RELEASE.jar
spring-beans-4.1.4.RELEASE.jar
spring-boot-1.2.1.RELEASE.jar
spring-boot-autoconfigure-1.2.1.RELEASE.jar
spring-boot-starter-1.2.1.RELEASE.jar
spring-boot-starter-logging-1.2.1.RELEASE.jar
spring-boot-starter-security-1.2.1.RELEASE.jar
spring-boot-starter-tomcat-1.2.1.RELEASE.jar
spring-context-4.1.4.RELEASE.jar
spring-core-4.1.4.RELEASE.jar
spring-expression-4.1.4.RELEASE.jar
spring-hateoas-0.17.0.RELEASE.jar
spring-plugin-core-1.2.0.RELEASE.jar
spring-plugin-metadata-1.2.0.RELEASE.jar
spring-security-config-3.2.5.RELEASE.jar
spring-security-core-3.2.5.RELEASE.jar
spring-security-jwt-1.0.2.RELEASE.jar
spring-security-oauth2-2.0.3.RELEASE.jar
spring-security-web-3.2.5.RELEASE.jar
spring-web-4.1.4.RELEASE.jar
spring-webmvc-4.1.4.RELEASE.jar
springfox-core-2.0.2.jar
springfox-schema-2.0.2.jar
springfox-spi-2.0.2.jar
springfox-spring-web-2.0.2.jar
springfox-swagger-common-2.0.2.jar
springfox-swagger2-2.0.2.jar
stax2-api-3.1.1.jar
swagger-annotations-1.3.12.jar
swagger-annotations-1.5.0.jar
swagger-core_2.10-1.3.12.jar
swagger-jaxrs_2.10-1.3.12.jar
swagger-jersey-jaxrs_2.10-1.3.12.jar
swagger-models-1.5.0.jar
tagsoup-1.2.1.jar
tomcat-embed-core-8.0.15.jar
tomcat-embed-el-8.0.15.jar
tomcat-embed-logging-juli-8.0.15.jar
tomcat-embed-websocket-8.0.15.jar
validation-api-1.1.0.Final.jar
woodstox-core-asl-4.1.4.jar
wsdl4j-1.6.2.jar
xml-path-2.4.0.jar
xmlschema-core-2.0.3.jar
Python Analytic Development
- Implement the analytic according to your development guidelines. In this discussion, the top level of the analytic implementation project structure is referred to as
analytic_directory
. - Implement an entry point that the Analytics Framework will use to run the analytic. The pattern required for this entry point depends on whether the analytic uses trained models, and if so, how the trained model is deployed (see About Machine Learning/Trained Analytics).
- Analytics that do not use trained models, or have the trained models embedded in the analytic artifact.Implement as follows.
def entry_method(self, inputJson):
entry-method
can be any method name. Register the file and method name with the framework in the config.json file.inputJson
is the Unicode string containing the JSON structure defined in the corresponding Analytic Template. The Unicodestring_output
must follow the output JSON structure defined in the Analytic Template.string_output
returns output that corresponds to the output JSON structure defined in the Analytic Template.
- Analytics that use trained models from the Runtime Configuration.Implement as follows.
def entry_method(self, inputJson, inputModels):
entry-method
can be any method name. Register the file and method name with the framework in the config.json file.inputJson
is the Unicode string containing the JSON structure defined in the corresponding Analytic Template. The Unicodestring_output
must follow the output JSON structure defined in the Analytic Template.string_output
returns output that corresponds to the output JSON structure defined in the Analytic Template.inputModels
contains the dict() of trained models defined in theinputModel
port definition in the Analytic Template. The keys will be the model names defined in the template.
- Analytics that do not use trained models, or have the trained models embedded in the analytic artifact.
- Create a config.json file in the top level of the
analytic_directory
. Complete the config.json entries as described in config.json Definitions table, below . - Package all the analytic files in the
analytic_directory
(including config.json and theentry-file
) into a ZIP file.
When developing the analytic driver, refer to the example code from demo-adder-py
for a simple Python analytic. The demo-adder-py
sample is available in GitHub at https://github.com/PredixDev/predix-analytics-sample.
Example: Python Analytic Configuration config.json File
The config.json file must follow the JSON structure as shown below.
{
"entry-method": "<entry-directory>.<entry-class>.<entry-method>",
"non-conda-libs": [
"boto==2.23.0"
],
"conda-libs": [
"numpy==1.17.0",
"scipy==1.3.1"
]
}
- You must list the Python packages with a version number using the following format:
numpy==1.7.0
- To determine where to define a library (
conda
ornon-conda
) in the config.json file, refer to the Anaconda distributed package list at https://anaconda.org/anaconda/repo. Define a package name included in this list in theconda-libs
section.
The python build pack installs the default libraries specified as well as any dependent libraries. The combined size of all the python libraries can be very large and create disk space issues, causing analytic deployment to fail. For example, the dependent MKL library is also pulled in. The MKL library in particular is very large.
If you experience disk space issues when deploying your python analytic, you can add the Nomkl library in the "conda-libs" section of the config.json file. Adding "nomkl" to the first position in the list will disable the automatic download of the MKL library and replace it with non-MKL versions. This substitution will save on disk space. Note that the MKL features will be turned off when using "nomkl".
The following is an example of how to specify "nomkl" in the config.json file.
{
"entry-method": "analytics.AnalyticDriverClass.run_analytic",
"non-conda-libs": [
"boto==2.23.0",
"uncertainties==2.4.8",
"pint==0.7.2",
"scikit-learn==0.19.2",
"scikit-image==0.14.1"
],
"conda-libs": [
"nomkl",
"numpy==1.7.0",
"pandas==0.24.1",
"scipy==1.3.1"
]
}
Property | Description |
---|---|
entry-directory | The directory (under analytic_directory ) containing the entry class. |
entry-class | The Python file contained the entry_method(…) function. |
entry-method | Name of the entry_method . |
non-conda python libraries | (Optional) List of non-conda Python libraries. e.g.: ["boto==2.23.0"] |
conda python libraries | (Optional) List of conda Python libraries. e.g.: ["numpy==1.17.0","scipy==1.3.1"] |
Dependent Python Libraries
Flask-1.0.2
jinja2-2.10
MarkupSafe-1.0
Werkzeug-0.14.1
amqp-2.3.2
certifi-2018.8.24
chardet-3.0.4
click-6.7
docopt-0.6.2
ecdsa-0.13
future-0.16.0
idna-2.7
itsdangerous-0.24
kombu-4.2.1
pika-0.12.0
pyasn1-0.4.4
pycrypto-2.6.1
python-jose-3.0.1
redis-2.10.6
requests-2.19.1
requests-toolbelt-0.8.0
rsa-4.0
six-1.11.0
stomp.py-6.1.0
urllib3-1.23
vine-1.1.4
It is likely that an individual Python analytic needs additional Python libraries such as numpy
, scipy
, pandas
, or scikit-learn
, and so on. To include additional libraries, specify them in libs
and conda-libs
fields in a config.json file. For an example, see Python Analytic Development.
A good practice is to add only the libraries required by the Python analytic as there is a cost to download and install each library and dependencies.