.. _sw_config_properties: Sparkling Water Configuration Properties ---------------------------------------- The following configuration properties can be passed to Spark to configure Sparking Water. Configuration properties independent of selected backend ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | Property name | Default value | H2OConf setter (* getter_) | Description | +====================================================+================+=================================================+========================================+ | **Generic parameters** | | | | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.backend.cluster.mode`` | ``internal`` | ``setInternalClusterMode()`` | This option can be set either to | | | | | ``internal`` or ``external``. When set | | | | ``setExternalClusterMode()`` | to ``external``, ``H2O Context`` is | | | | | created by connecting to existing H2O | | | | | cluster, otherwise H2O cluster located | | | | | inside Spark is created. That means | | | | | that each Spark executor will have one | | | | | H2O instance running in it. The | | | | | ``internal`` mode is not recommended | | | | | for big clusters and clusters where | | | | | Spark executors are not stable. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.cloud.name`` | Generated | ``setCloudName(String)`` | Name of H2O cluster. | | | unique name | | | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.nthreads`` | ``-1`` | ``setNthreads(Integer)`` | Limit for number of threads used by | | | | | H2O, default ``-1`` means: | | | | | Use value of ``spark.executor.cores`` | | | | | in case this property is set. | | | | | Otherwise use H2O's default value | | | | | |H2ONThreadsDefault|. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.repl.enabled`` | ``true`` | ``setReplEnabled()`` | Decides whether H2O REPL is initiated | | | | | or not. | | | | ``setReplDisabled()`` | | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.scala.int.default.num`` | ``1`` | ``setDefaultNumReplSessions(Integer)`` | Number of parallel REPL sessions | | | | | started at the start of Sparkling | | | | | Water | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.topology.change.listener.enabled`` | ``true`` | ``setClusterTopologyListenerEnabled()`` | Decides whether listener which kills | | | | | H2O cluster on the change of the | | | | ``setClusterTopologyListenerDisabled()`` | underlying cluster's topology is | | | | | enabled or not. This configuration | | | | | has effect only in non-local mode. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.spark.version.check.enabled`` | ``true`` | ``setSparkVersionCheckEnabled()`` | Enables check if run-time Spark | | | | | version matches build time Spark | | | | ``setSparkVersionCheckDisabled()`` | version. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.fail.on.unsupported.spark.param`` | ``true`` | ``setFailOnUnsupportedSparkParamEnabled()`` | If unsupported Spark parameter is | | | | | detected, then application is forced | | | | ``setFailOnUnsupportedSparkParamDisabled()`` | to shutdown. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.jks`` | ``None`` | ``setJks(String)`` | Path to Java KeyStore file. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.jks.pass`` | ``None`` | ``setJksPass(String)`` | Password for Java KeyStore file. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.jks.alias`` | ``None`` | ``setJksAlias(String)`` | Alias to certificate in keystore to | | | | | secure H2O Flow. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.hash.login`` | ``false`` | ``setHashLoginEnabled()`` | Enable hash login. | | | | | | | | | ``setHashLoginDisabled()`` | | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.ldap.login`` | ``false`` | ``setLdapLoginEnabled()`` | Enable LDAP login. | | | | | | | | | ``setLdapLoginDisabled()`` | | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.kerberos.login`` | ``false`` | ``setKerberosLoginEnabled()`` | Enable Kerberos login. | | | | | | | | | ``setKerberosLoginDisabled()`` | | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.login.conf`` | ``None`` | ``setLoginConf(String)`` | Login configuration file. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.user.name`` | ``None`` | ``setUserName(String)`` | Username used for the backend H2O | | | | | cluster and to authenticate the | | | | | client against the backend. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.password`` | ``None`` | ``setPassword(String)`` | Password used to authenticate the | | | | | client against the backend. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.internal_security_conf`` | ``None`` | ``setSslConf(String)`` | Path to a file containing H2O or | | | | | Sparkling Water internal security | | | | | configuration. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.auto.flow.ssl`` | ``false`` | ``setAutoFlowSslEnabled()`` | Automatically generate the required | | | | | key store and password to secure H2O | | | | ``setAutoFlowSslDisabled()`` | flow by SSL. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.node.log.level`` | ``INFO`` | ``setH2ONodeLogLevel(String)`` | H2O internal log level used for H2O | | | | | nodes except the client. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.node.log.dir`` | |h2oLogDir| | ``setH2ONodeLogDir(String)`` | Location of H2O logs on H2O nodes | | | | | except on the client. | | | or |yarnDir| | | | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.backend.heartbeat.interval`` | ``10000ms`` | ``setBackendHeartbeatInterval(Integer)`` | Interval for getting heartbeat from | | | | | the H2O backend. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.cloud.timeout`` | ``60*1000`` | ``setCloudTimeout(Integer)`` | Timeout (in msec) for cluster | | | | | formation. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.node.network.mask`` | ``None`` | ``setNodeNetworkMask(String)`` | Subnet selector for H2O running inside | | | | | Spark executors. This disables using | | | | | IP reported by Spark but tries to find | | | | | IP based on the specified mask. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.stacktrace.collector.interval`` | ``-1`` | ``setStacktraceCollectorInterval(Integer)`` | Interval specifying how often stack | | | | | traces are taken on each H2O node. | | | | | -1 means that no stack traces will be | | | | | taken. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.context.path`` | ``None`` | ``setContextPath(String)`` | Context path to expose H2O web server. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.flow.scala.cell.async`` | ``false`` | ``setFlowScalaCellAsyncEnabled()`` | Decide whether the Scala cells in | | | | | H2O Flow will run synchronously or | | | | ``setFlowScalaCellAsyncDisabled()`` | Asynchronously. Default is | | | | | synchronously. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.flow.scala.cell.max.parallel`` | ``-1`` | ``setMaxParallelScalaCellJobs(Integer)`` | Number of max parallel Scala cell | | | | | jobs The value -1 means | | | | | not limited. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.internal.port.offset`` | ``1`` | ``setInternalPortOffset(Integer)`` | Offset between the API(=web) port and | | | | | the internal communication port on the | | | | | client node; | | | | | ``api_port + port_offset = h2o_port`` | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.node.port.base`` | ``54321`` | ``setNodeBasePort(Integer)`` | Base port used for individual H2O | | | | | nodes. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.mojo.destroy.timeout`` | ``600000`` | ``setMojoDestroyTimeout(Integer)`` | If a scoring MOJO instance is not used | | | | | within a Spark executor JVM for | | | | | a given timeout in milliseconds, it's | | | | | evicted from executor's cache. Default | | | | | timeout value is 10 minutes. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.node.extra`` | ``None`` | ``setNodeExtraProperties(String)`` | A string containing extra parameters | | | | | passed to H2O nodes during startup. | | | | | This parameter should be configured | | | | | only if H2O parameters do not have any | | | | | corresponding parameters in Sparkling | | | | | Water. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.flow.extra.http.headers`` | ``None`` | ``setFlowExtraHttpHeaders(Map[String,String])`` | Extra HTTP headers that will be used | | | | | in communication between the front-end | | | | ``setFlowExtraHttpHeaders(String)`` | and back-end part of Flow UI. | | | | | The headers should be delimited by | | | | | a new line. Don't forget to escape | | | | | special characters when passing | | | | | the parameter from a command line. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.internal_secure_connections`` | ``false`` | ``setInternalSecureConnectionsEnabled()`` | Enables secure communications among | | | | | H2O nodes. The security is based on | | | | ``setInternalSecureConnectionsDisabled()`` | automatically generated keystore | | | | | and truststore. This is equivalent for | | | | | ``-internal_secure_conections`` option | | | | | in `H2O Hadoop deployments | | | | | `_. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.allow_insecure_xgboost`` | ``false`` | ``setInsecureXGBoostAllowed()`` | If the property set to true, insecure | | | | | communication among H2O nodes is | | | | ``setInsecureXGBoostDenied()`` | allowed for the XGBoost algorithm even | | | | | if the property |secureConnections| is | | | | | set to ``true``. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.kerberized.hive.enabled`` | ``false`` | ``setKerberizedHiveEnabled()`` | If enabled, H2O instances will create | | | | | JDBC connections to a Kerberized Hive | | | | ``setKerberizedHiveDisabled()`` | so that all clients can read data | | | | | from HiveServer2. Don't forget to put | | | | | a jar with Hive driver on Spark | | | | | classpath if the internal backend is | | | | | used. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.hive.host`` | ``None`` | ``setHiveHost(String)`` | The full address of HiveServer2, | | | | | for example hostname:10000 | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.hive.principal`` | ``None`` | ``setHivePrincipal(String)`` | Hiveserver2 Kerberos principal, | | | | | for example hive/hostname@DOMAIN.COM | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.hive.jdbc_url_pattern`` | ``None`` | ``setHiveJdbcUrlPattern(String)`` | A pattern of JDBC URL used for | | | | | connecting to Hiveserver2. Example: | | | | | ``jdbc:hive2://{{host}}/;{{auth}}`` | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.hive.token`` | ``None`` | ``setHiveToken(String)`` | An authorization token to Hive | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | **H2O client parameters** | | | | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.client.flow.dir`` | ``None`` | ``setFlowDir(String)`` | Directory where flows from H2O Flow | | | | | are saved. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.client.ip`` | ``None`` | ``setClientIp(String)`` | IP of H2O client node. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.client.iced.dir`` | ``None`` | ``setClientIcedDir(String)`` | Location of iced directory for the | | | | | driver instance. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.client.log.level`` | ``INFO`` | ``setH2OClientLogLevel(String)`` | H2O internal log level used for H2O | | | | | client running inside Spark driver. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.client.log.dir`` | |h2oLogDir| | ``setH2OClientLogDir(String)`` | Location of H2O logs on the driver | | | | | machine. | | | | | | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.client.port.base`` | ``54321`` | ``setClientBasePort(Integer)`` | Port on which H2O client publishes | | | | | its API. If already occupied, the next | | | | | odd port is tried on so on. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.client.web.port`` | ``-1`` | ``setClientWebPort(Integer)`` | Exact client port to access web UI. | | | | | The value ``-1`` means automatic | | | | | search for a free port starting at | | | | | ``spark.ext.h2o.port.base``. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.client.verbose`` | ``false`` | ``setClientVerboseEnabled()`` | The client outputs verbose log output | | | | | directly into console. Enabling the | | | | ``setClientVerboseDisabled()`` | flag increases the client log level to | | | | | ``INFO``. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.client.network.mask`` | ``None`` | ``setClientNetworkMask(String)`` | Subnet selector for H2O client, this | | | | | disables using IP reported by Spark | | | | | but tries to find IP based on the | | | | | specified mask. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.client.flow.baseurl.override`` | ``None`` | ``setClientFlowBaseurlOverride(String)`` | Allows to override the base URL | | | | | address of Flow UI, including the | | | | | scheme, which is showed to the user. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.cluster.client.retry.timeout`` | ``60000`` | ``setClientCheckRetryTimeout(Integer)`` | Timeout in milliseconds specifying | | | | | how often we check whether the | | | | | the client is still connected. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.client.extra`` | ``None`` | ``setClientExtraProperties(String)`` | A string containing extra parameters | | | | | passed to H2O client during startup. | | | | | This parameter should be configured | | | | | only if H2O parameters do not have any | | | | | corresponding parameters in Sparkling | | | | | Water. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.verify_ssl_certificates`` | ``True`` | ``setVerifySslCertificates(Boolean)`` | Whether certificates should be | | | | | verified before using in H2O or not. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ -------------- Internal backend configuration properties ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | Property name | Default value | H2OConf setter (* getter_) | Description | +====================================================+================+=================================================+========================================+ | **Generic parameters** | | | | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.cluster.size`` | ``None`` | ``setNumH2OWorkers(Integer)`` | Expected number of workers of H2O | | | | | cluster. Value None means automatic | | | | | detection of cluster size. This number | | | | | must be equal to number of Spark | | | | | executors. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.dummy.rdd.mul.factor`` | ``10`` | ``setDrddMulFactor(Integer)`` | Multiplication factor for dummy RDD | | | | | generation. Size of dummy RDD is | | | | | ``spark.ext.h2o.cluster.size`` \* | | | | | ``spark.ext.h2o.dummy.rdd.mul.factor`` | | | | | . | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.spreadrdd.retries`` | ``10`` | ``setNumRddRetries(Integer)`` | Number of retries for creation of an | | | | | RDD spread across all existing Spark | | | | | executors. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.default.cluster.size`` | ``20`` | ``setDefaultCloudSize(Integer)`` | Starting size of cluster in case that | | | | | size is not explicitly configured. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.subseq.tries`` | ``5`` | ``setSubseqTries(Integer)`` | Subsequent successful tries to figure | | | | | out size of Spark cluster, which are | | | | | producing the same number of nodes. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.hdfs_conf`` | |hadoopConfig| | ``setHdfsConf(String)`` | Either a string with the Path to a file| | | | | with Hadoop HDFS configuration or the | | | | | org.apache.hadoop.conf.Configuration | | | | | object. Useful for HDFS credentials | | | | | settings and other HDFS-related | | | | | configurations. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | **H2O nodes parameters** | | | | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | ``spark.ext.h2o.node.iced.dir`` | ``None`` | ``setNodeIcedDir(String)`` | Location of iced directory for H2O | | | | | nodes on the Spark executors. | +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ -------------- External backend configuration properties ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +-------------------------------------------------------+----------------+-------------------------------------------------+-------------------------------------+ | Property name | Default value | H2OConf setter (* getter_) | Description | +=======================================================+================+=================================================+=====================================+ | ``spark.ext.h2o.cloud.representative`` | ``None`` | ``setH2OCluster(String)`` | ip:port of arbitrary H2O node to | | | | | identify external H2O cluster. | +-------------------------------------------------------+----------------+-------------------------------------------------+-------------------------------------+ | ``spark.ext.h2o.external.cluster.size`` | ``None`` | ``setClusterSize(Integer)`` | Number of H2O nodes to start when | | | | | ``auto`` mode of the external | | | | | backend is set. | +-------------------------------------------------------+----------------+-------------------------------------------------+-------------------------------------+ | ``spark.ext.h2o.cluster.start.timeout`` | ``120s`` | ``setClusterStartTimeout(Integer)`` | Timeout in seconds for starting | | | | | H2O external cluster. | +-------------------------------------------------------+----------------+-------------------------------------------------+-------------------------------------+ | ``spark.ext.h2o.cluster.info.name`` | ``None`` | ``setClusterInfoFile(Integer)`` | Full path to a file which is used | | | | | sd the notification file for the | | | | | startup of external H2O cluster. | +-------------------------------------------------------+----------------+-------------------------------------------------+-------------------------------------+ | ``spark.ext.h2o.hadoop.memory`` | ``6G`` | ``setMapperXmx(String)`` | Amount of memory assigned to each | | | | | H2O node on YARN/Hadoop. | +-------------------------------------------------------+----------------+-------------------------------------------------+-------------------------------------+ | ``spark.ext.h2o.external.hdfs.dir`` | ``None`` | ``setHDFSOutputDir(String)`` | Path to the directory on HDFS used | | | | | for storing temporary files. | +-------------------------------------------------------+----------------+-------------------------------------------------+-------------------------------------+ | ``spark.ext.h2o.external.start.mode`` | ``manual`` | ``useAutoClusterStart()`` | If this option is set to ``auto`` | | | | | then H2O external cluster is | | | | ``useManualClusterStart()`` | automatically started using the | | | | | provided H2O driver JAR on YARN, | | | | | otherwise it is expected that the | | | | | cluster is started by the user | | | | | manually. | +-------------------------------------------------------+----------------+-------------------------------------------------+-------------------------------------+ | ``spark.ext.h2o.external.h2o.driver`` | ``None`` | ``setH2ODriverPath(String)`` | Path to H2O driver used during | | | | | ``auto`` start mode. | +-------------------------------------------------------+----------------+-------------------------------------------------+-------------------------------------+ | ``spark.ext.h2o.external.yarn.queue`` | ``None`` | ``setYARNQueue(String)`` | Yarn queue on which external H2O | | | | | cluster is started. | +-------------------------------------------------------+----------------+-------------------------------------------------+-------------------------------------+ | ``spark.ext.h2o.external.kill.on.unhealthy`` | ``true`` | ``setKillOnUnhealthyClusterEnabled()`` | If true, the client will try to | | | | | kill the cluster and then itself in | | | | ``setKillOnUnhealthyClusterDisabled()`` | case some nodes in the cluster | | | | | report unhealthy status. | +-------------------------------------------------------+----------------+-------------------------------------------------+-------------------------------------+ | ``spark.ext.h2o.external.kerberos.principal`` | ``None`` | ``setKerberosPrincipal(String)`` | Kerberos Principal. | +-------------------------------------------------------+----------------+-------------------------------------------------+-------------------------------------+ | ``spark.ext.h2o.external.kerberos.keytab`` | ``None`` | ``setKerberosKeytab(String)`` | Kerberos Keytab. | +-------------------------------------------------------+----------------+-------------------------------------------------+-------------------------------------+ | ``spark.ext.h2o.external.run.as.user`` | ``None`` | ``setRunAsUser(String)`` | Impersonated Hadoop user. | +-------------------------------------------------------+----------------+-------------------------------------------------+-------------------------------------+ | ``spark.ext.h2o.external.driver.if`` | ``None`` | ``setExternalH2ODriverIf(String)`` | Ip address or network of | | | | | mapper->driver callback interface. | | | | | Default value means automatic | | | | | detection. | +-------------------------------------------------------+----------------+-------------------------------------------------+-------------------------------------+ | ``spark.ext.h2o.external.driver.port`` | ``None`` | ``setExternalH2ODriverPort(Integer)`` | Port of mapper->driver callback | | | | | interface. Default value means | | | | | automatic detection. | +-------------------------------------------------------+----------------+-------------------------------------------------+-------------------------------------+ | ``spark.ext.h2o.external.driver.port.range`` | ``None`` | ``setExternalH2ODriverPortRange(String)`` | Range portX-portY of mapper->driver | | | | | callback interface; eg: | | | | | 50000-55000. | +-------------------------------------------------------+----------------+-------------------------------------------------+-------------------------------------+ | ``spark.ext.h2o.external.extra.memory.percent`` | ``10`` | ``setExternalExtraMemoryPercent(Integer)`` | This option is a percentage of | | | | | ``spark.ext.h2o.hadoop.memory`` and | | | | | specifies memory for internal JVM | | | | | use outside of Java heap. | +-------------------------------------------------------+----------------+-------------------------------------------------+-------------------------------------+ | ``spark.ext.h2o.external.backend.stop.timeout`` | ``10000ms`` | ``setExternalBackendStopTimeout(Integer)`` | Timeout for confirmation from | | | | | worker nodes when stopping the | | | | | external backend. It is also | | | | | possible to pass ``-1`` to ensure | | | | | the indefinite timeout. The unit is | | | | | milliseconds. | +-------------------------------------------------------+----------------+-------------------------------------------------+-------------------------------------+ | ``spark.ext.h2o.external.hadoop.executable`` | ``hadoop`` | ``setExternalHadoopExecutable(String)`` | Name or path to path to a hadoop | | | | | executable binary which is used | | | | | to start external H2O backend on | | | | | YARN. | +-------------------------------------------------------+----------------+-------------------------------------------------+-------------------------------------+ | ``spark.ext.h2o.external.extra.jars`` | ``None`` | ``setExternalExtraJars(String)`` | Comma-separated paths to jars that | | | | | will be placed onto classpath of | | | | ``setExternalExtraJars(String[])`` | each H2O node. | +-------------------------------------------------------+----------------+-------------------------------------------------+-------------------------------------+ | ``spark.ext.h2o.external.communication.compression`` | ``SNAPPY`` | ``setExternalCommunicationCompression(String)`` | The type of compression used for | | | | | data transfer between Spark and H2O | | | | | node. Possible values are ``NONE``, | | | | | ``DEFLATE``, ``GZIP``, ``SNAPPY``. | +-------------------------------------------------------+----------------+-------------------------------------------------+-------------------------------------+ .. _getter: H2OConf getter can be derived from the corresponding setter. All getters are parameter-less. If the type of the property is Boolean, the getter is prefixed with ``is`` (E.g. ``setReplEnabled()`` -> ``isReplEnabled()``). Property getters of other types do not have any prefix and start with lowercase (E.g. ``setUserName(String)`` -> ``userName`` for Scala, ``userName()`` for Python). .. |H2ONThreadsDefault| replace:: ``Runtime.getRuntime().availableProcessors()`` .. |hadoopConfig| replace:: ``sc.hadoopConfig`` .. |h2oLogDir| replace:: ``{user.dir}/h2ologs/{SparkAppId}`` .. |yarnDir| replace:: YARN container dir .. |secureConnections| replace:: ``spark.ext.h2o.internal_secure_connections``