JPMML-Evaluator Decision Tree/Random Forest Performance Test on Amazon EC2 t2.micro

One day I had a discussion with our CTO Villu Ruusmann about what scoring numbers we can actually use in our sales pitch or promote on the web page. Villu was telling me about single digit microseconds, which - I can't say I didn't believe, but was quite a bit skeptical about. I wanted to see it by myself, so Villu gave me the following command line to experiment with:

C:\github\jpmml-evaluator>java -jar pmml-evaluator-example\target\example-1.4-SNAPSHOT.jar --model c:\github\openscoring\openscoring-service\src\etc\DecisionTreeIris.pmml --input c:\github\openscoring\openscoring-service\src\etc\input.csv --output test.txt --intern --optimize --loop 1000000

It gave me some interesting numbers which seemed to confirm the single digit microseconds, but I won't paste them here right away, because this was run on my personal laptop, the CPU usage never exceeded 40% and the CSV file to be scored includes 3 records. Not good for comparison.

I wanted something repeatable, something everyone can actually try, check out the setup themselves, count the lines in the input CSV files, verify the number of trees in a random forest. So once again, I turned to Amazon EC2.


The Setup

I used the following setup for the tests:

  • All tests were done on Amazon EC2 t2.micro (available on free tier). I didn't accumulate any costs during the tests.
  • JPMML-Evaluator 1.4.0 was used in the test
  • CSV files were used as an input to the tests (it's by far easiest way to generate large number of records)
  • Each test included scoring a million data points:
    • 1-record CSV scored 1000000 times
    • 10-record CSV scored 100000 times
    • 100-record CSV scored 10000 times
    • 1000-record CSV scored 1000 times
    • 10000-record CSV scored 100 times
    • 100000-record CSV scored 10 times
    • (with 1 000 000 records once I ran into memory issues on t2.micro)
  • The same approach was used to score the following models:
  • The following command line parameters were added the the evaluator:
    • --model specifies the model to be used (model is loaded only once and not 1 million times when scoring single record)
    • --input points to the CSV file containing the records to be scored
    • --output specifies the output file
      • Output file is overwritten for every loop - scoring 1 record million times still produces 1 line in output file and means that file is opened, written and closed also a million times
      • Scoring 1 hundred thousand records ten times produces output file with 100 000 records
      • Using /dev/null as output didn't result any performance gains
    • --loop which specifies the number of times input file is run through the scoring process
    • --intern replaces recurring elements in PMML with one element, this improves memory usage, but not performance
    • --optimize which tries to convert java.lang.String elements to java.lang.Integer or java.lang.Double once in the beginning and not on every model execution
  • I ran every command 5 times, chose the best 3 from those 5 and averaged the results (this was done because while the highs were relatively consistent, there were 2-3 lows during the overall testing process which I couldn't anyhow relate to our app).


The Results: Iris Decision Tree


With a simple decision tree the single data point is scored wih 5 microseconds (that's five millionths of a second aka 0.005 milliseconds). The peak performance is with a combination of 100-line CSV scored 10 000 times, where nearly 250 000 records are scored with 1 second, bringing the single score down to 4 microseconds. My theory is that the peak is produced by an optimum between handling file writes (remember, when scoring a single record, file is opened, written and closed 1 million times) and managing the CSV in memory (addressing an array with 1000 items starts to take its toll).


The Results: Titanic Random Forest


With 20-tree random forest, the single data point is scored with 19 microseconds, which is considerably slower than with a single decision tree, but it's nowhere near 20 times slower as one would estimate - it's merely 3,6 times slower. The optimum performance at 100*10 000 is not as pronounced as proportionally more time is spent on calculating the scores than on other activities (like writing files).




I've included a graph showing Iris (tree model) and Titanic (random forest) model performance side by side - again, although the random forest contains 20 tree models whose output is averaged, it's not nearly 20 times slower than a single tree (which we consider really good!).



We can conclude that the base performance of JPMML library really is very good, with single data point scorings measurable in microseconds. And this is all in single thread, on the least powerful virtual machine AWS provides! In a less rigorous test on my Lenovo T470s laptop I could get around 700 000+ scores per second with the Iris decision tree at about 80% CPU utilization. Imagine what we could do in a multithreaded, optimized production environment running on a powerful server :-)


Next Steps and How You Can Help

Iris and Titanic are both decision tree models, we're interested in doing the same also with other model types, possibly trained in different environments and converted to PMML with different options. Same goes for our REST-based scoring engine.

You can help us by providing some actual models solving real-world problems along with sample dataset for scoring (say, 10 records) and we will run them under same conditions for comparison. Of course you can obfuscate the labels of inputs and outputs. Small enough models can be sent by email and the bigger ones shared using WeTransfer - my email is I'd appreciate a few words about what these models do, which I won't disclose without permission. Or alternatively - you can do your own tests on the AMIs provided below.


Amazon Machine Images

The AMI-s are as follows (let me know if you can't access any of these regions and I'll make the image available in your region too):

  • EU (Frankfurt) - ami-0a22a739ce1f9a9a7
  • US West (N. California) - ami-02a035e53d718a40f
  • US East (N. Virginia) - ami-0d90521380e13b956

When launching the AMI, you'll need to create a keypair to access it (there're very good tutorials provided by AWS for this).

JPMML-Evaluator is located in /home/ec2-user/jpmml-evaluator with Iris PMML and CSV-s in IrisTestData/ and TitanicTestData/ respectively.

The sample command line goes like this (assuming working directory ~/jpmml-evaluator):

java -jar pmml-evaluator-example/target/example-1.4-SNAPSHOT.jar --model TitanicTestData/Titanic.pmml --input TitanicTestData/Titanic-10.csv --output test.txt --intern --optimize --loop 100000

Here's the sample output:

[ec2-user@ip-172-31-41-51 jpmml-evaluator]$ pwd
[ec2-user@ip-172-31-41-51 jpmml-evaluator]$ java -jar pmml-evaluator-example/target/example-1.4-SNAPSHOT.jar --model TitanicTestData/Titanic.pmml --input TitanicTestData/Titanic-10.csv --output test.txt --intern --optimize --loop 100000 
3/18/18 9:59:54 AM
-- Timers --------
             count = 100000
         mean rate = 5353.47 calls/second
     1-minute rate = 4464.99 calls/second
     5-minute rate = 4279.49 calls/second
    15-minute rate = 4246.15 calls/second
               min = 0.14 milliseconds
               max = 188.50 milliseconds
              mean = 0.18 milliseconds
            stddev = 0.65 milliseconds
            median = 0.16 milliseconds
              75% <= 0.17 milliseconds
              95% <= 0.18 milliseconds
              98% <= 0.66 milliseconds
              99% <= 0.75 milliseconds
            99.9% <= 4.47 milliseconds

Please be aware that the calls per second mean rate is per one CSV file that you feed to JPMML-Evaluator (if you have 1 record CSV, the mean rate is per single data point, if you have million-line CSV, the mean rate is per million records and to get single data point score, you have to multiply it by million).

Just a quick note from Villu regarding the difference between max and mean time - the maximum time happens on the very first evaluation when the optimizations are done.