Quick and Easy Deployment of Machine Learning (ML) Models

A standards-based, open-source software suite for moving Apache Spark, R and Scikit-Learn models from “lab” to “factory”

TL;DR

Why do startups, SMEs and Fortune 500 companies choose Openscoring software?

Predictability

Turn-key implementation of ML projects

Business value is created by finding a solution to a problem, and delivering it to customers.

The more engineering you need to do, the slower the pace, and the higher the cost. Use a deterministic helper (“buy”) instead of taking on an open-ended development and maintenance commitment (“build”).

Openscoring provides end-to-end, no-code/low-code workflows.

Scalability

Keeping humans out of the everyday ML loop

Humans outdo machines only on creative tasks. Creativity is involved in designing and assembling ML workflows, not running them.

Good software is an easier hire than a good data scientist or data engineer.

Openscoring provides self-documenting, self-testing, self-integrating ML models.

Functionality

Unlocking the full potential of ML artifacts

Finalized ML models are typically regarded as “black boxes” that can only deliver numeric predictions.

Yet, if you approach them right, they can become intellectual property assets, which give valuable insights into business processes.

Openscoring provides rich APIs for accessing each and every aspect of models and predictions.

Products

Openscoring software has been grouped into three product verticals:

Building a model evaluator instance from a PMML XML file:

import java.io.File;
import org.jpmml.evaluator.Evaluator;
import org.jpmml.evaluator.LoadingModelEvaluatorBuilder;

Evaluator evaluator = new LoadingModelEvaluatorBuilder()
    .load(new File("LogisticRegression.pmml"))
    .build();

Doing the same, plus transpiling PMML XML markup to Java PMML API-backed Java bytecode for 5-15x performance improvement:

import org.jpmml.transpiler.FileTranspiler;
import org.jpmml.transpiler.Transpiler;
import org.jpmml.transpiler.TranspilerTransformer;

LoadingModelEvaluatorBuilder evaluatorBuilder = new LoadingModelEvaluatorBuilder()
    .load(new File("XGBoost.pmml"));

try {
    Transpiler transpiler = new FileTranspiler(null, new File("XGBoost.pmml.jar"));

    evaluatorBuilder = evaluatorBuilder
        .transform(new TranspilerTransformer(transpiler));
} catch(IOException ioe){
    // Ignored - the buildable evaluator shall fall back to the default evaluation mode
}

Evaluator evaluator = evaluatorBuilder.build();

Using the embedded verification dataset for self-testing and overall warm-up:

evaluator.verify();

Evaluating data records:

import org.dmg.pmml.FieldName
import org.jpmml.evaluator.EvaluatorUtil;
import org.jpmml.evaluator.FieldValue;
import org.jpmml.evaluator.InputField;
import org.jpmml.evaluator.OutputField;
import org.jpmml.evaluator.TargetField;

Map<String, Object> userArguments = readArguments();

Map<FieldName, FieldValue> pmmlArguments = new HashMap<>();
List<? extends InputField> inputFields = evaluator.getInputFields();
for(InputField inputField : inputFields){
    Object userValue = userArguments.get((inputField.getName()).getValue());
    // Transform an arbitrary Java primitive value to a known-good PMML argument value
    FieldValue pmmlValue = inputField.prepare(userValue);
    pmmlArguments.put(inputField.getName(), pmmlValue);
}

// Evaluate
Map<FieldName, ?> pmmlResults = evaluator.evaluate(pmmlArguments);

Map<String, Object> userResults = new HashMap<>();
// Primary result(s) (eg. y)
List<? extends TargetField> targetFields = evaluator.getTargetFields();
for(TargetField targetField : targetFields){
    Object targetValue = pmmlResults.get(targetField.getName());
    // Transform a PMML result value to a Java primitive value
    targetValue = EvaluatorUtil.decode(targetValue);
    userResults.put((targetField.getName()).getValue(), targetValue);
}
// Secondary results (eg. probability(y), affinity(y), entityId(y))
List<? extends OutputField> outputFields = evaluator.getOutputFields();
for(OutputField outputField : outputFields){
    Object outputValue = pmmlResults.get(outputField.getName());
    userResults.put((outputField.getName()).getValue(), outputValue);
}

writeResults(userResults);
Scikit-Learn
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import OneHotEncoder
from sklearn2pmml import sklearn2pmml
from sklearn2pmml.pipeline import PMMLPipeline

import pandas

df = pandas.read_csv("Audit.csv")

pipeline = PMMLPipeline([
    ("transformer", ColumnTransformer([
        ("continuous", "passthrough", ["Age", "Hours", "Income"]),
        ("categorical", OneHotEncoder(), ["Employment", "Education", "Marital", "Occupation", "Gender", "Deductions"])
    ])),
    ("classifier", LogisticRegression(multi_class = "ovr"))
])
pipeline.fit(df, df["Adjusted"])
pipeline.verify(df.sample(10))

sklearn2pmml(pipeline, "LogisticRegression.pmml")
R
library("dplyr")
library("r2pmml")

df = read.csv("Audit.csv")
df$Adjusted = as.factor(df$Adjusted)

audit.glm = glm("Adjusted ~ .", data = df, family = "binomial")
audit.glm = verify(audit.glm, newdata = sample_n(df, 10))

r2pmml(audit.glm, "LogisticRegression.pmml")
Apache Spark
import java.io.File
import org.apache.spark.ml.Pipeline
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.ml.feature.RFormula
import org.jpmml.sparkml.PMMLBuilder
import org.jpmml.sparkml.model.HasPredictionModelOptions;

val df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("Audit.csv")

val rFormula = new RFormula().setFormula("Adjusted ~ .")
val lr = new LogisticRegression().setLabelCol(rFormula.getLabelCol).setFeaturesCol(rFormula.getFeaturesCol)
val pipeline = new Pipeline().setStages(Array(rFormula, lr))

val pipelineModel = pipeline.fit(df)

val pmmlBuilder = new PMMLBuilder(df.schema, pipelineModel).putOption(HasPredictionModelOptions.OPTION_KEEP_PREDICTIONCOL, false).verify(df.sample(false, 0.01).limit(10))

pmmlBuilder.buildFile(new File("LogisticRegression.pmml"))

The Audit dataset (binary target; three continuous and six categorical features)

Running the Openscoring server application:

$ java -jar openscoring-server-executable-${version}.jar

Deploying a model from a PMML XML file, using it for evaluation (batch mode), and undeploying:

$ curl -X PUT --data-binary @LogisticRegression.pmml -H "Content-type: text/xml" http://localhost:8080/openscoring/model/MyAuditModel
$ curl -X POST --data-binary @Audit.csv -H "Content-type: text/plain; charset=UTF-8" http://localhost:8080/openscoring/model/MyAuditModel/csv > Audit-results.csv
$ curl -X DELETE http://localhost:8080/openscoring/model/MyAuditModel

Doing the same using the Openscoring-Python client library:

from openscoring import Openscoring

os = Openscoring(base_url = "http://localhost:8080/openscoring")

os.deployFile("MyAuditModel", "LogisticRegression.pmml")
os.evaluateCsvFile("MyAuditModel", "Audit.csv", "Audit-results.csv")
os.undeploy("MyAuditModel")

PAPIs 2018 tool demonstration: "Putting five ML models to production in five minutes"

How does re-licensing work?

The majority of Openscoring software is released under the terms and conditions of the GNU Affero General Public License (AGPL), version 3.0. AGPLv3 is a free software license [1]. AGPLv3 is very similar to the GNU General Public License (GPL), version 3, but comes with an additional provision, which addresses the use of software over a computer network.

If AGPLv3 is not acceptable, then it is possible to enter into a licensing agreement, which makes Openscoring software available under the terms and conditions of the BSD 3-Clause License. The re-licensing process is quick and easy (attainable by exchanging three e-mails), and protects the interests of both parties.

[1] “Free software” is a matter of liberty, not price. To understand the concept, you should think of “free” as in “free speech”, not as in “free beer”. For more information, please see The Free Software Definition.

Why standardize ML workflows using the Predictive Model Markup Language (PMML)?

Further assistance and discussion

Openscoring sells software, but provides free support services
VR
Villu Ruusmann
Founder and CTO

Detailed guidance, feature requests, bug reports about specific products? Please open a new GitHub issue with the appropriate Java PMML API or Openscoring REST API repository.

Questions about PMML and its applicability to ML workflows? Please open a new thread in the JPMML Mailing List.

Other exciting opportunities? Contact privately.

Openscoring is domiciled in Estonia. The business activity takes place between 7am and 9pm GMT, seven days a week.