Sparkling Water Logging ----------------------- Changing Log Directory ~~~~~~~~~~~~~~~~~~~~~~ Sparkling Water log directory can be specified using the Spark option ``spark.ext.h2o.log.dir`` or at run-time using our setters as: .. content-tabs:: .. tab-container:: Scala :title: Scala .. code:: Scala import ai.h2o.sparkling._ val conf = new H2OConf().setLogDir("dir") val hc = H2OContext.getOrCreate(conf) .. tab-container:: Python :title: Python .. code:: python from pysparkling import * conf = H2OConf().setLogDir("dir") hc = H2OContext.getOrCreate(conf) .. tab-container:: R :title: R .. code:: r library(rsparkling) sc <- spark_connect(master = "local") conf = H2OConf()$setLogDir("dir") hc = H2OContext.getOrCreate(conf) Logging Directory Selection ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sparkling Water uses the following steps to determine the final logging directory: - First, we check if the ``spark.yarn.app.container.log.dir`` environmental property is defined. If it is available, we use it as a logging directory. - If ``spark.yarn.app.container.log.dir`` is missing, we check whether ``spark.ext.h2o.log.dir`` is defined and use it if it is not empty. - At last, if both options are missing, we store the logs into the default directory ``${user.dir}/h2ologs/${sparkAppId}`` We can see that ``spark.yarn.app.container.log.dir`` has precedence over our logging option. This is to ensure that when running on YARN, the logs are collected by YARN tooling. Obtaining Logs for Sparkling Water on YARN ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When launching Sparkling Water on YARN, you can find the application ID for the YARN job on the resource manager (where you can also find the application master, which is also the Spark master). The following command prints the YARN logs to the console: .. code:: shell yarn logs -applicationId Obtaining Logs for Standalone Sparkling Water ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ By default, the Spark property ``SPARK_LOG_DIR`` is set to ``$SPARK_HOME/work/``. To also log the configuration with which the Spark was started, start Sparkling Water with the following configuration: .. code:: shell bin/sparkling-shell.sh --conf spark.logConf=true The logs for the particular application are located at ``$SPARK_HOME/work/``. The directory contains also stdout and stderr for each node in the cluster. Change Sparkling Water Logging Level ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To change the log level for H2O running inside the Sparkling Water, you can use the option ``spark.ext.h2o.log.level`` or set it as: .. content-tabs:: .. tab-container:: Scala :title: Scala .. code:: Scala import ai.h2o.sparkling._ val conf = new H2OConf().setLogLevel("DEBUG") val hc = H2OContext.getOrCreate(conf) .. tab-container:: Python :title: Python .. code:: python from pysparkling import * conf = H2OConf().setLogLevel("DEBUG") hc = H2OContext.getOrCreate(conf) .. tab-container:: R :title: R .. code:: r library(rsparkling) sc <- spark_connect(master = "local") conf = H2OConf()$setLogLevel("DEBUG") hc = H2OContext.getOrCreate(conf) We can also change the logging level used by Spark by modifying the log4j.properties file passed to Spark as: .. code:: shell cd $SPARK_HOME/conf cp log4j.properties.template log4j.properties Then either in a text editor or vim, change the contents of the log4j.properties file from: .. code:: shell #Set everything to be logged to the console log4j.rootCategory=INFO, console ... to: .. code:: shell #Set everything to be logged to the console log4j.rootCategory=WARN, console ...