Productionizing H2O

(Note: This section is a work in progress.)

About POJOs and MOJOs

H2O allows you to convert the models you have built to either a Plain Old Java Object (POJO) or a Model ObJect, Optimized (MOJO).

H2O-generated MOJO and POJO models are intended to be easily embeddable in any Java environment. The only compilation and runtime dependency for a generated model is the h2o-genmodel.jar file produced as the build output of these packages. This file is a library that supports scoring. For POJOs, it contains the base classes from which the POJO is derived from. (You can see “extends GenModel” in a pojo class. The GenModel class is part of this library.) For MOJOs, it also contains the required readers and interpreters. The h2o-genmodel.jar file is required when POJO/MOJO models are deployed to production.

Users can refer to the following Quick Start files for more information about generating POJOs and MOJOs:

Notes:

  • MOJOs are supported for DRF, GBM, GLM, GLRM, K-Means, Word2vec, and XGBoost models only.
  • POJOs are not supported for XGBoost.

Developers can refer to the the POJO and MOJO Model Javadoc.

Example Design Patterns

Here is a collection of example design patterns for how to productionize H2O.

Consumer loan application

Characteristic Value
Pattern name Jetty servlet
Example training language R
Example training data source CSV file
Example scoring data source User input to Javascript application running in browser
Scoring environment REST API service provided by Jetty servlet
Scoring engine H2O POJO
Scoring latency SLA Real-time
Resource Location
Git repos https://github.com/h2oai/app-consumer-loan
Slides http://docs.h2o.ai/h2o-tutorials/latest-stable/tutorials/building-a-smarter-application/index.html
Videos http://library.fora.tv/2015/11/09/building_a_smart_application_hands_on_tom

Craigslist application

Characteristic Value
Pattern name Sparkling water streaming
Example training language Scala
Example training data source CSV file
Example scoring data source User input to Javascript application running in browser
Scoring engine H2O cluster
Scoring latency SLA Real-time
Resource Location
Git repos https://github.com/h2oai/app-ask-craig
Blogs

http://blog.h2o.ai/2015/06/ask-craig-sparkling-water/

http://blog.h2o.ai/2015/07/ask-craig-sparkling-water-2/

Slides

http://www.slideshare.net/0xdata/sparkling-water-ask-craig

http://www.slideshare.net/0xdata/sparkling-water-applications-meetup-072115

Malicious domain application

Characteristic Value
Pattern name AWS Lambda
Example training language Python
Example training data source CSV file
Example scoring data source User input to Javascript application running in browser
Scoring environment AWS Lambda REST API endpoint
Scoring engine H2O POJO
Scoring latency SLA Real-time
Resource Location
Git repos https://github.com/h2oai/app-malicious-domains
Slides https://github.com/h2oai/h2o-meetups/tree/master/2016_05_03_H2O_Open_Tour_Chicago_Application
Videos http://library.fora.tv/2016/05/03/design_patterns_for_smart_applications_and_data_products

Storm bolt

Characteristic Value
Pattern name Storm bolt
Example training language R
Example training data source CSV file
Example scoring data source Storm spout
Scoring environment POJO embedded in a Storm bolt
Scoring engine H2O POJO
Scoring latency SLA Real-time
Resource Location
Git repos https://github.com/h2oai/h2o-tutorials/tree/master/tutorials/streaming/storm
Tutorials http://docs.h2o.ai/h2o-tutorials/latest-stable/tutorials/streaming/storm/index.html

Invoking POJO directly in R

Characteristic Value
Pattern name POJO in R
Example training language R
Example training data source (Need example)
Example scoring data source (Need example)
Scoring environment R
Scoring engine H2O POJO
Scoring latency SLA Batch

Hive UDF

Characteristic Value
Pattern name Hive UDF
Example training language R
Example training data source HDFS directory with hive part files output by a SELECT
Example scoring data source Hive
Scoring environment Hive SELECT query (parallel MapReduce) running UDF
Scoring engine H2O POJO
Scoring latency SLA Batch
Resource Location
Git repos https://github.com/h2oai/h2o-tutorials/tree/master/tutorials/hive_udf_template
Tutorials http://docs.h2o.ai/h2o-tutorials/latest-stable/tutorials/hive_udf_template/index.html

MOJO as a JAR Resource

Characteristic Value
Pattern name MOJO JAR
Example training language R
Example training data source Iris
Example scoring data source Single Row
Scoring environment Portable
Scoring engine H2O MOJO
Scoring latency SLA Real-time example, but can be adapted (use in Hive UDF etc.)
Resource Location
Git repos https://github.com/h2oai/h2o-tutorials/tree/master/tutorials/mojo-resource

Steam Scoring Server from H2O.ai

Characteristic Value
Pattern name Steam
Scoring data source REST API client
Scoring environment Steam scoring server
Scoring engine H2O POJO
Scoring latency SLA Real-time
Resource Location
Web sites http://www.h2o.ai/steam/