.. _sw_config_properties: Sparkling Water Configuration Properties ---------------------------------------- The following configuration properties can be passed to Spark to configure Sparking Water. Configuration properties independent of selected backend ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | Property name | Default value | H2OConf setter (* getter_) | Description | +=========================================================+===============+======================================================================+====================================================================================================================================+ | ``spark.ext.h2o.backend.cluster.mode`` | internal | ``setInternalClusterMode()`` | This option can be set either to ``internal`` or ``external``. When set to ``external``, ``H2O Context`` is | | | | | created by connecting to existing H2O cluster, otherwise H2O cluster located inside Spark is created. That | | | | ``setExternalClusterMode()`` | means that each Spark executor will have one H2O instance running in it. The ``internal`` mode is not | | | | | recommended for big clusters and clusters where Spark executors are not stable. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.cloud.name`` | None | ``setCloudName(String)`` | Name of H2O cluster. If this option is not set, the name is automatically generated | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.nthreads`` | -1 | ``setNthreads(Integer)`` | Limit for number of threads used by H2O, default ``-1`` means: Use value of ``spark.executor.cores`` in | | | | | case this property is set. Otherwise use H2O's default | | | | | value Runtime.getRuntime() | | | | | .availableProcessors() | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.repl.enabled`` | true | ``setReplEnabled()`` | Decides whether H2O REPL is initiated or not. | | | | | | | | | ``setReplDisabled()`` | | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.scala.int.default.num`` | 1 | ``setDefaultNumReplSessions(Integer)`` | Number of parallel REPL sessions started at the start of Sparkling Water. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.topology.change.listener.enabled`` | true | ``setClusterTopologyListenerEnabled()`` | Decides whether listener which kills H2O cluster on the change of the underlying cluster's topology is | | | | | enabled or not. This configuration has effect only in non-local mode. | | | | ``setClusterTopologyListenerDisabled()`` | | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.spark.version.check.enabled`` | true | ``setSparkVersionCheckEnabled()`` | Enables check if run-time Spark version matches build time Spark version. | | | | | | | | | ``setSparkVersionCheckDisabled()`` | | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.fail.on.unsupported.spark.param`` | true | ``setFailOnUnsupportedSparkParamEnabled()`` | If unsupported Spark parameter is detected, then application is forced to shutdown. | | | | | | | | | ``setFailOnUnsupportedSparkParamDisabled()`` | | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.jks`` | None | ``setJks(String)`` | Path to a Java keystore file with certificates securing H2O Flow UI and internal REST connections between | | | | | instances (driver + executors) and H2O nodes. When configuring this property, you must consider that a Spark executor | | | | | can communicate to any of H2O nodes and verifies H2O node according to the hostname specified in the keystore | | | | | certificate. You can consider usage of a wildcard certificate or you can disable the hostname verification | | | | | completely with the ``spark.ext.h2o.verify_ssl_hostnames`` property. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.jks.pass`` | None | ``setJksPass(String)`` | Password for the Java keystore file. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.jks.alias`` | None | ``setJksAlias(String)`` | Alias to certificate in the to the Java keystore file to secure H2O Flow UI and internal REST connections | | | | | between Spark instances (driver + executors) and H2O nodes. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.ssl.ca.cert`` | None | ``setSslCACert(String)`` | A path to a CA bundle file or a directory with certificates of trusted CAs. This path is used by RSparkling or | | | | | PySparking for connecting to a Sparkling Water backend. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.hash.login`` | false | ``setHashLoginEnabled()`` | Enable hash login. | | | | | | | | | ``setHashLoginDisabled()`` | | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.ldap.login`` | false | ``setLdapLoginEnabled()`` | Enable LDAP login. | | | | | | | | | ``setLdapLoginDisabled()`` | | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.kerberos.login`` | false | ``setKerberosLoginEnabled()`` | Enable Kerberos login. | | | | | | | | | ``setKerberosLoginDisabled()`` | | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.login.conf`` | None | ``setLoginConf(String)`` | Login configuration file. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.user.name`` | None | ``setUserName(String)`` | Username used for the backend H2O cluster and to authenticate the client against the backend. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.password`` | None | ``setPassword(String)`` | Password used to authenticate the client against the backend. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.internal_security_conf`` | None | ``setSslConf(String)`` | Path to a file containing H2O or Sparkling Water internal security configuration. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.auto.flow.ssl`` | false | ``setAutoFlowSslEnabled()`` | Automatically generate the required key store and password to secure secure H2O Flow UI and internal REST | | | | | connections between Spark executors and H2O nodes. Hostname verification is disabled when creating SSL | | | | ``setAutoFlowSslDisabled()`` | connections to H2O nodes. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.log.level`` | INFO | ``setLogLevel(String)`` | H2O log level. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.log.dir`` | None | ``setLogDir(String)`` | Location of H2O logs. When not specified, it uses {user.dir}/h2ologs/{AppId} or YARN container dir | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.backend.heartbeat.interval`` | 10000 | ``setBackendHeartbeatInterval(Integer)`` | Interval (in msec) for getting heartbeat from the H2O backend. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.cloud.timeout`` | 60000 | ``setCloudTimeout(Integer)`` | Timeout (in msec) for cluster formation. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.node.network.mask`` | None | ``setNodeNetworkMask(String)`` | Subnet selector for H2O running inside park executors. This disables using IP reported by Spark but tries to | | | | | find IP based on the specified mask. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.stacktrace.collector.interval`` | -1 | ``setStacktraceCollectorInterval(Integer)`` | Interval specifying how often stack traces are taken on each H2O node. -1 means | | | | | that no stack traces will be taken | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.context.path`` | None | ``setContextPath(String)`` | Context path to expose H2O web server. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.flow.scala.cell.async`` | false | ``setFlowScalaCellAsyncEnabled()`` | Decide whether the Scala cells in H2O Flow will run synchronously or Asynchronously. Default is synchronously. | | | | | | | | | ``setFlowScalaCellAsyncDisabled()`` | | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.flow.scala.cell.max.parallel`` | -1 | ``setMaxParallelScalaCellJobs(Integer)`` | Number of max parallel Scala cell jobs. The value -1 means not limited. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.internal.port.offset`` | 1 | ``setInternalPortOffset(Integer)`` | Offset between the API(=web) port and the internal communication port on the client | | | | | node; ``api_port + port_offset = h2o_port`` | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.base.port`` | 54321 | ``setBasePort(Integer)`` | Base port used for individual H2O nodes | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.mojo.destroy.timeout`` | 600000 | ``setMojoDestroyTimeout(Integer)`` | If a scoring MOJO instance is not used within a Spark executor JVM for a given timeout in milliseconds, it's | | | | | evicted from executor's cache. Default timeout value is 10 minutes. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.extra.properties`` | None | ``setExtraProperties(String)`` | A string containing extra parameters passed to H2O nodes during startup. This parameter should be | | | | | configured only if H2O parameters do not have any corresponding parameters in Sparkling Water. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.flow.dir`` | None | ``setFlowDir(String)`` | Directory where flows from H2O Flow are saved. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.flow.extra.http.headers`` | None | ``setFlowExtraHttpHeaders(Map[String,String])`` | Extra HTTP headers that will be used in communication between the front-end and back-end part of Flow UI. The | | | | | headers should be delimited by a new line. Don't forget to escape special characters when passing | | | | ``setFlowExtraHttpHeaders(String)`` | the parameter from a command line. Example: ``"spark.ext.h2o.flow.extra.http.headers=Strict-Transport-Security:max-age=31536000"`` | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.internal_secure_connections`` | false | ``setInternalSecureConnectionsEnabled()`` | Enables secure communications among H2O nodes. The security is based on | | | | | automatically generated keystore and truststore. This is equivalent for | | | | ``setInternalSecureConnectionsDisabled()`` | ``-internal_secure_conections`` option in H2O Hadoop. More information | | | | | is available in the H2O documentation. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.allow_insecure_xgboost`` | false | ``setInsecureXGBoostAllowed()`` | If the property set to true, insecure communication among H2O nodes is | | | | | allowed for the XGBoost algorithm even if the other security options are enabled | | | | ``setInsecureXGBoostDenied()`` | | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.client.ip`` | None | ``setClientIp(String)`` | IP of H2O client node. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.client.web.port`` | -1 | ``setClientWebPort(Integer)`` | Exact client port to access web UI. The value ``-1`` means automatic | | | | | search for a free port starting at ``spark.ext.h2o.base.port``. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.client.verbose`` | false | ``setClientVerboseEnabled()`` | The client outputs verbose log output directly into console. Enabling the | | | | | flag increases the client log level to ``INFO``. | | | | ``setClientVerboseDisabled()`` | | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.client.network.mask`` | None | ``setClientNetworkMask(String)`` | Subnet selector for H2O client, this disables using IP reported by Spark | | | | | but tries to find IP based on the specified mask. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.client.flow.baseurl.override`` | None | ``setClientFlowBaseurlOverride(String)`` | Allows to override the base URL address of Flow UI, including the | | | | | scheme, which is showed to the user. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.cluster.client.retry.timeout`` | 60000 | ``setClientCheckRetryTimeout(Integer)`` | Timeout in milliseconds specifying how often we check whether the | | | | | the client is still connected. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.verify_ssl_certificates`` | true | ``setVerifySslCertificates(Boolean)`` | If the property is enabled, Pysparkling or RSparkling client will verify certificates when connecting | | | | | Sparkling Water Flow UI. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.internal.rest.verify_ssl_certificates`` | true | ``setSslCertificateVerificationInInternalRestConnectionsEnabled()`` | If the property is enabled, Sparkling Water will verify ssl certificates during establishing secured http connections | | | | | to one of H2O nodes. Such connections are utilized for delegation of Flow UI calls to H2O leader node or | | | | ``setSslCertificateVerificationInInternalRestConnectionsDisabled()`` | during data exchange between Spark executors and H2O nodes. If the property is disabled, hostname verification is | | | | | disabled as well. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.internal.rest.verify_ssl_hostnames`` | true | ``setSslHostnameVerificationInInternalRestConnectionsEnabled()`` | If the property is enabled, Sparkling Water will verify a hostname during establishing of secured http connections | | | | | to one of H2O nodes. Such connections are utilized for delegation of Flow UI calls to H2O leader node or | | | | ``setSslHostnameVerificationInInternalRestConnectionsDisabled()`` | during data exchange between Spark executors and H2O nodes. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.kerberized.hive.enabled`` | false | ``setKerberizedHiveEnabled()`` | If enabled, H2O instances will create JDBC connections to a Kerberized Hive | | | | | so that all clients can read data from HiveServer2. Don't forget to put | | | | ``setKerberizedHiveDisabled()`` | a jar with Hive driver on Spark classpath if the internal backend is used. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.hive.host`` | None | ``setHiveHost(String)`` | The full address of HiveServer2, for example hostname:10000. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.hive.principal`` | None | ``setHivePrincipal(String)`` | Hiveserver2 Kerberos principal, for example hive/hostname@DOMAIN.COM | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.hive.jdbc_url_pattern`` | None | ``setHiveJdbcUrlPattern(String)`` | A pattern of JDBC URL used for connecting to Hiveserver2. Example: ``jdbc:hive2://{{host}}/;{{auth}}`` | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.hive.token`` | None | ``setHiveToken(String)`` | An authorization token to Hive. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.iced.dir`` | None | ``setIcedDir(String)`` | Location of iced directory for H2O nodes. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.rest.api.timeout`` | 300000 | ``setSessionTimeout(Boolean)`` | Timeout in milliseconds for Rest API requests. | +---------------------------------------------------------+---------------+----------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+ -------------- Internal backend configuration properties ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +---------------------------------------------+---------------+-------------------------------------+------------------------------------------------------------------------------------------------------------------+ | Property name | Default value | H2OConf setter (* getter_) | Description | +=============================================+===============+=====================================+==================================================================================================================+ | ``spark.ext.h2o.cluster.size`` | None | ``setNumH2OWorkers(Integer)`` | Expected number of workers of H2O cluster. Value None means automatic | | | | | detection of cluster size. This number must be equal to number of Spark executors. If Spark property | | | | | ``spark.executor.instances`` is specified, this Sparkling Water property is set to its value. | +---------------------------------------------+---------------+-------------------------------------+------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.extra.cluster.nodes`` | false | ``setExtraClusterNodesEnabled()`` | If the property is set true and the Sparkling Water internal backend identifies more executors than specified in | | | | | the Spark property ``spark.executor.instances`` or in the Sparkling Water property | | | | ``setExtraClusterNodesDisabled()`` | ``spark.ext.h2o.cluster.size``, Sparkling Water deploys H2O nodes to all discovered Spark executors. Otherwise, | | | | | Sparkling Water deploys just a number of executors specified in ``spark.ext.h2o.cluster.size`` | | | | | (or ``spark.executor.instances``). | +---------------------------------------------+---------------+-------------------------------------+------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.dummy.rdd.mul.factor`` | 10 | ``setDrddMulFactor(Integer)`` | Multiplication factor for dummy RDD generation. Size of dummy RDD is | | | | | ``spark.ext.h2o.cluster.size`` multiplied by this option. | +---------------------------------------------+---------------+-------------------------------------+------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.spreadrdd.retries`` | 10 | ``setNumRddRetries(Integer)`` | Number of retries for creation of an RDD spread across all existing Spark executors | +---------------------------------------------+---------------+-------------------------------------+------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.default.cluster.size`` | 20 | ``setDefaultCloudSize(Integer)`` | Starting size of cluster in case that size is not explicitly configured. | +---------------------------------------------+---------------+-------------------------------------+------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.subseq.tries`` | 5 | ``setSubseqTries(Integer)`` | Subsequent successful tries to figure out size of Spark cluster, which are | | | | | producing the same number of nodes. | +---------------------------------------------+---------------+-------------------------------------+------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.hdfs_conf`` | None | ``setHdfsConf(String)`` | Either a string with the Path to a file with Hadoop HDFS configuration or the | | | | | hadoop.conf.Configuration object in the org.apache package. Useful for HDFS credentials | | | | | settings and other HDFS-related configurations. Default value None means | | | | | use `sc.hadoopConfig`. | +---------------------------------------------+---------------+-------------------------------------+------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.spreadrdd.retries.timeout`` | 0 | ``setSpreadRddRetriesTimeout(Int)`` | Specifies how long the discovering of Spark executors should last. This | | | | | option has precedence over other options influencing the discovery | | | | | mechanism. That means that as long as the timeout hasn't expired, we keep | | | | | trying to discover new executors. This option might be useful in environments | | | | | where Spark executors might join the cloud with some delays. | +---------------------------------------------+---------------+-------------------------------------+------------------------------------------------------------------------------------------------------------------+ -------------- External backend configuration properties ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | Property name | Default value | H2OConf setter (* getter_) | Description | +======================================================+=================+=================================================+======================================================================================================================================+ | ``spark.ext.h2o.external.driver.if`` | None | ``setExternalH2ODriverIf(String)`` | Ip address or network of mapper->driver callback interface. Default value means automatic detection. | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.driver.port`` | None | ``setExternalH2ODriverPort(Integer)`` | Port of mapper->driver callback interface. Default value means automatic detection. | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.driver.port.range`` | None | ``setExternalH2ODriverPortRange(String)`` | Range portX-portY of mapper->driver callback interface; eg: 50000-55000. | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.extra.memory.percent`` | 10 | ``setExternalExtraMemoryPercent(Integer)`` | This option is a percentage of external memory option and specifies memory | | | | | for internal JVM use outside of Java heap. | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.cloud.representative`` | None | ``setH2OCluster(String)`` | ip:port of a H2O cluster leader node to identify external H2O cluster. | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.cluster.size`` | None | ``setClusterSize(Integer)`` | Number of H2O nodes to start when ``auto`` mode of the external backend is set. | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.cluster.start.timeout`` | 120 | ``setClusterStartTimeout(Integer)`` | Timeout in seconds for starting H2O external cluster | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.cluster.info.name`` | None | ``setClusterInfoFile(Integer)`` | Full path to a file which is used as the notification file for the startup of external H2O cluster. | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.memory`` | 6G | ``setExternalMemory(String)`` | Amount of memory assigned to each external H2O node | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.hdfs.dir`` | None | ``setHDFSOutputDir(String)`` | Path to the directory on HDFS used for storing temporary files. | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.start.mode`` | manual | ``useAutoClusterStart()`` | If this option is set to ``auto`` then H2O external cluster is automatically started using the | | | | | provided H2O driver JAR on YARN, otherwise it is expected that the cluster is started by the user | | | | ``useManualClusterStart()`` | manually | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.h2o.driver`` | None | ``setH2ODriverPath(String)`` | Path to H2O driver used during ``auto`` start mode. | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.yarn.queue`` | None | ``setYARNQueue(String)`` | Yarn queue on which external H2O cluster is started. | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.kill.on.unhealthy`` | true | ``setKillOnUnhealthyClusterEnabled()`` | If true, the client will try to kill the cluster and then itself in | | | | | case some nodes in the cluster report unhealthy status. | | | | ``setKillOnUnhealthyClusterDisabled()`` | | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.kerberos.principal`` | None | ``setKerberosPrincipal(String)`` | Kerberos Principal | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.kerberos.keytab`` | None | ``setKerberosKeytab(String)`` | Kerberos Keytab | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.run.as.user`` | None | ``setRunAsUser(String)`` | Impersonated Hadoop user | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.backend.stop.timeout`` | 10000 | ``setExternalBackendStopTimeout(Integer)`` | Timeout for confirmation from worker nodes when stopping the external backend. It is also | | | | | possible to pass ``-1`` to ensure the indefinite timeout. The unit is milliseconds. | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.hadoop.executable`` | hadoop | ``setExternalHadoopExecutable(String)`` | Name or path to path to a hadoop executable binary which is used | | | | | to start external H2O backend on YARN. | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.extra.jars`` | None | ``setExternalExtraJars(String)`` | Comma-separated paths to jars that will be placed onto classpath of each H2O node. | | | | | | | | | ``setExternalExtraJars(String[])`` | | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.communication.compression`` | SNAPPY | ``setExternalCommunicationCompression(String)`` | The type of compression used for data transfer between Spark and H2O node. | | | | | Possible values are ``NONE``, ``DEFLATE``, ``GZIP``, ``SNAPPY``. | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.auto.start.backend`` | yarn | ``setExternalAutoStartBackend(String)`` | The backend on which the external H2O backend will be started in auto start mode. | | | | | Possible values are ``YARN`` and ``KUBERNETES``. | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.k8s.h2o.service.name`` | h2o-service | ``setExternalK8sH2OServceName(String)`` | Name of H2O service required to start H2O on K8s. | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.k8s.h2o.statefulset.name`` | h2o-statefulset | ``setExternalK8sH2OStatefulsetName(String)`` | Name of H2O stateful set required to start H2O on K8s. | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.k8s.h2o.label`` | app=h2o | ``setExternalK8sH2OLabel(String)`` | Label used to select node for H2O cluster formation. | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.k8s.h2o.api.port`` | 8081 | ``setExternalK8sH2OApiPort(String)`` | Kubernetes API port. | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.k8s.namespace`` | default | ``setExternalK8sNamespace(String)`` | Kubernetes namespace where external H2O is started. | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.k8s.docker.image`` | See doc | ``setExternalK8sDockerImage(String)`` | Docker image containing Sparkling Water external H2O backend. Default value is h2oai/sparkling-water-external-backend:3.34.0.1-1-2.4 | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.k8s.domain`` | cluster.local | ``setExternalK8sDomain(String)`` | Domain of the Kubernetes cluster. | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ | ``spark.ext.h2o.external.k8s.svc.timeout`` | 300 | ``setExternalK8sServiceTimeout(Int)`` | Timeout in seconds used as a limit for K8s service creation. | +------------------------------------------------------+-----------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------+ -------------- .. _getter: H2OConf getter can be derived from the corresponding setter. All getters are parameter-less. If the type of the property is Boolean, the getter is prefixed with ``is`` (E.g. ``setReplEnabled()`` -> ``isReplEnabled()``). Property getters of other types do not have any prefix and start with lowercase (E.g. ``setUserName(String)`` -> ``userName`` for Scala, ``userName()`` for Python).