.. _coxph: Train CoxPH Model in Sparkling Water ----------------------------------------------- Sparkling Water provides API for H2O CoxPH in Scala and Python. The following sections describe how to train the CoxPH model in Sparkling Water in both languages. See also :ref:`parameters_H2OCoxPH` and :ref:`model_details_H2OCoxPHMOJOModel`. .. content-tabs:: .. tab-container:: Scala :title: Scala First, let's start Sparkling Shell as .. code:: shell ./bin/sparkling-shell Start H2O cluster inside the Spark environment .. code:: scala import ai.h2o.sparkling._ import java.net.URI val hc = H2OContext.getOrCreate() Parse the data using H2O and convert them to Spark Frame .. code:: scala import org.apache.spark.SparkFiles spark.sparkContext.addFile("https://raw.githubusercontent.com/h2oai/sparkling-water/master/examples/smalldata/coxph_test/heart.csv") val heartDF = spark.read.option("header", "true").option("inferSchema", "true").csv(SparkFiles.get("heart.csv")) val Array(trainingDF, testingDF) = heartDF.randomSplit(Array(0.8, 0.2), seed = 12345) Train the model. You can configure all the available CoxPH parameters using provided setters. .. code:: scala import ai.h2o.sparkling.ml.algos.H2OCoxPH val estimator = new H2OCoxPH(). setStartCol("start"). setStopCol("stop"). setTies("breslow"). setLabelCol("event") val model = estimator.fit(trainingDF) Run Predictions .. code:: scala model.transform(testingDF).show(false) You can also get model details via calling methods listed in :ref:`model_details_H2OCoxPHMOJOModel`. .. tab-container:: Python :title: Python First, let's start PySparkling Shell as .. code:: shell ./bin/pysparkling Start H2O cluster inside the Spark environment .. code:: python from pysparkling import * hc = H2OContext.getOrCreate() Parse the data using H2O and convert them to Spark Frame .. code:: python import h2o heartFrame = h2o.import_file("https://raw.githubusercontent.com/h2oai/sparkling-water/master/examples/smalldata/coxph_test/heart.csv") trainingFrame, testingFrame = heartFrame.split_frame(ratios = [.8], seed = 1234) trainingDF = hc.asSparkFrame(trainingFrame) testingDF = hc.asSparkFrame(testingFrame) Train the model. You can configure all the available CoxPH arguments using provided setters or constructor parameters. .. code:: python from pysparkling.ml import H2OCoxPH estimator = H2OCoxPH()\ .setStartCol('start')\ .setStopCol('stop')\ .setTies('breslow')\ .setLabelCol('event') model = estimator.fit(trainingDF) Run Predictions .. code:: python model.transform(testingDF).show(truncate = False) You can also get model details via calling methods listed in :ref:`model_details_H2OCoxPHMOJOModel`.