Obtain SHAP values from MOJO model¶
You can train the pipeline in Sparkling Water and get contributions from it or you can also get contributions from raw mojo. The following two sections describe how to achieve that.
Train model pipeline & get contributions¶
Obtaining SHAP values is possible only from H2OGBM, H2OXGBoost and H2ODRF pipeline wrappers and for regression or binomial problems.
To get SHAP values(=contributions) from H2OXGBoost model, please do:
Scala
First, let’s start Sparkling Shell as
./bin/sparkling-shell
Start H2O cluster inside the Spark environment
import org.apache.spark.h2o._
import java.net.URI
val hc = H2OContext.getOrCreate()
Parse the data using H2O and convert them to Spark Frame
val frame = new H2OFrame(new URI("https://raw.githubusercontent.com/h2oai/sparkling-water/master/examples/smalldata/prostate/prostate.csv"))
val sparkDF = hc.asDataFrame(frame).withColumn("CAPSULE", $"CAPSULE" cast "string")
val Array(trainingDF, testingDF) = sparkDF.randomSplit(Array(0.8, 0.2))
Train the model. You can configure all the available XGBoost arguments using provided setters, such as the label column.
import ai.h2o.sparkling.ml.algos.H2OXGBoost
val estimator = new H2OXGBoost().setLabelCol("CAPSULE").setWithDetailedPredictionCol(true)
val model = estimator.fit(trainingDF)
The call setWithDetailedPredictionCol(true)
tells Sparkling Water to create additional prediction column with
additional prediction details, such as the contributions. The name of this column is by default “detailed_prediction”
and can be modified via setDetailedPredictionCol
setter.
Run Predictions
val predictions = model.transform(testingDF).show(false)
Show contributions
predictions.select("detailed_prediction.contribution").show()
Python
First, let’s start PySparkling Shell as
./bin/pysparkling
Start H2O cluster inside the Spark environment
from pysparkling import *
hc = H2OContext.getOrCreate()
Parse the data using H2O and convert them to Spark Frame
import h2o
frame = h2o.import_file("https://raw.githubusercontent.com/h2oai/sparkling-water/master/examples/smalldata/prostate/prostate.csv")
sparkDF = hc.asSparkFrame(frame)
sparkDF = sparkDF.withColumn("CAPSULE", sparkDF.CAPSULE.cast("string"))
[trainingDF, testingDF] = sparkDF.randomSplit([0.8, 0.2])
Train the model. You can configure all the available XGBoost arguments using provided setters or constructor parameters, such as the label column.
from pysparkling.ml import H2OXGBoost
estimator = H2OXGBoost(labelCol = "CAPSULE", withDetailedPredictionCol = True)
model = estimator.fit(trainingDF)
The parameter withDetailedPredictionCol = True
tells Sparkling Water to create additional prediction column with
additional prediction details, such as the contributions. The name of this column is by default “detailed_prediction”
and can be modified via detailedPredictionCol
parameter.
Run Predictions
model.transform(testingDF).show(truncate = False)
Show contributions
predictions.select("detailed_prediction.contributions").show()
Get Contributions from Raw MOJO¶
Obtaining SHAP values is possible only from MOJO’s generated for GBM, XGBoost and DRF and for
regression or binomial problems. If you don’t need to train the model and just need to load existing mojo,
there is no need to start H2OContext
.
Scala
First, let’s start Sparkling Shell as
./bin/sparkling-shell
Parse the data using Spark
val testingDF = spark.read.option("header", "true").option("inferSchema", "true").csv("/path/to/testing/dataset.csv")
Load the existing MOJO and enable generation of contributions via the settings object.
import ai.h2o.sparkling.ml.models._
val path = "/path/to/mojo.zip"
val settings = H2OMOJOSettings(withDetailedPredictionCol = true)
val model = H2OMOJOModel.createFromMojo(path, settings)
Run Predictions
val predictions = model.transform(testingDF)
Show contributions
predictions.select("detailed_prediction.contributions").show()
Python
First, let’s start PySparkling Shell as
./bin/pysparkling
Parse the data using Spark
testingDF = spark.read.csv("/path/to/testing/dataset.csv", header=True, inferSchema=True)
Load the existing MOJO and enable generation of contributions via the settings object.
from pysparkling.ml import *
val path = '/path/to/mojo.zip'
settings = H2OMOJOSettings(withDetailedPredictionCol=True)
model = H2OMOJOModel.createFromMojo(path, settings)
Run Predictions
val predictions = model.transform(testingDF)
Show contributions
predictions.select("detailed_prediction.contributions").show()