Change Sparkling Shell Logs Location

We can configure the location of the Sparkling Water logs, but we need to distinguish between the client/driver node and the H2O worker nodes.

Client

The logs location for the client node is driven by the spark.ext.h2o.client.log.dir spark configuration property.

We can either start the spark application with this configuration being passed on the command line such as:

$SPARK_HOME/bin/spark-submit --conf "spark.ext.h2o.client.log.dir=/client/log/location"

or we can set it at runtime, but before you create the H2OContext, as the in the following examples:

In Scala:

val conf = new H2OConf(spark).setH2OClientLogDir("log_location")
val hc = H2OContext.getOrCreate(spark, conf

In Python:

conf = H2OConf(spark).setH2OClientLogDir("log_location")
hc = H2OContext.getOrCreate(spark, conf)

Worker Nodes

For the worker nodes, we first check if the spark.yarn.app.container.log.dir environmental property is defined. If it is available, we store the logs there.

If this environmental property is missing we try to read spark.ext.h2o.node.log.dir spark configuration property and store the logs there. If this property is missing, we store the logs from the worker nodes into the default directory which is specified as:

System.getProperty("user.dir")/h2ologs/${sparkAppId}

So to change the logs location for the worker nodes we can either set the environment variable spark.yarn.app.container.log.dir, or specify the spark configuration property spark.ext.h2o.node.log.dir.

We can start the spark application with this configuration being passed on the command line such as:

$SPARK_HOME/bin/spark-submit --conf "spark.ext.h2o.node.log.dir=/worker/node/log/location"

or we can set it at runtime, but before you create the H2OContext, as the in the following examples:

In Scala:

val conf = new H2OConf(spark).setH2ONodeLogDir("log_location")
val hc = H2OContext.getOrCreate(spark, conf

In Python:

conf = H2OConf(spark).setH2ONodeLogDir("log_location")
hc = H2OContext.getOrCreate(spark, conf)