Sparkling Water Configuration Properties¶
The following configuration properties can be passed to Spark to configure Sparking Water.
Configuration properties independent of selected backend¶
Property name |
Default value |
H2OConf setter (* getter) |
Description |
---|---|---|---|
Generic parameters |
|||
|
|
|
This option can be set either to
|
|
Generated unique name |
|
Name of H2O cluster. |
|
|
|
Limit for number of threads used by
H2O, default |
|
|
|
Decides whether H2O REPL is initiated or not. |
|
|
|
Number of parallel REPL sessions started at the start of Sparkling Water |
|
|
|
Decides whether listener which kills H2O cluster on the change of the underlying cluster’s topology is enabled or not. This configuration has effect only in non-local mode. |
|
|
|
Enables check if run-time Spark version matches build time Spark version. |
|
|
|
If unsupported Spark parameter is detected, then application is forced to shutdown. |
|
|
|
Path to Java KeyStore file. |
|
|
|
Password for Java KeyStore file. |
|
|
|
Alias to certificate in keystore to secure H2O Flow. |
|
|
|
Enable hash login. |
|
|
|
Enable LDAP login. |
|
|
|
Enable Kerberos login. |
|
|
|
Login configuration file. |
|
|
|
Username used for the backend H2O cluster and to authenticate the client against the backend. |
|
|
|
Password used to authenticate the client against the backend. |
|
|
|
Path to a file containing H2O or Sparkling Water internal security configuration. |
|
|
|
Automatically generate the required key store and password to secure H2O flow by SSL. |
|
|
|
H2O internal log level used for H2O nodes except the client. |
|
or YARN container dir |
|
Location of H2O logs on H2O nodes except on the client. |
|
|
|
Interval for getting heartbeat from the H2O backend. |
|
|
|
Timeout (in msec) for cluster formation. |
|
|
|
Subnet selector for H2O running inside Spark executors. This disables using IP reported by Spark but tries to find IP based on the specified mask. |
|
|
|
Interval specifying how often stack traces are taken on each H2O node. -1 means that no stack traces will be taken. |
|
|
|
Context path to expose H2O web server. |
|
|
|
Decide whether the Scala cells in H2O Flow will run synchronously or Asynchronously. Default is synchronously. |
|
|
|
Number of max parallel Scala cell jobs The value -1 means not limited. |
|
|
|
Offset between the API(=web) port and
the internal communication port on the
client node;
|
|
|
|
Base port used for individual H2O nodes. |
|
|
|
If a scoring MOJO instance is not used within a Spark executor JVM for a given timeout in milliseconds, it’s evicted from executor’s cache. Default timeout value is 10 minutes. |
|
|
|
A string containing extra parameters passed to H2O nodes during startup. This parameter should be configured only if H2O parameters do not have any corresponding parameters in Sparkling Water. |
|
|
|
Extra HTTP headers that will be used in communication between the front-end and back-end part of Flow UI. The headers should be delimited by a new line. Don’t forget to escape special characters when passing the parameter from a command line. |
|
|
|
Enables secure communications among
H2O nodes. The security is based on
automatically generated keystore
and truststore. This is equivalent for
|
|
|
|
If enabled, H2O instances will create JDBC connections to Hive so that H2O Python & R API will be able to read data from HiveServer2. Don’t forget to put a jar with Hive driver on Spark classpath if the internal backend is used. |
|
|
|
The full address of HiveServer2, for example hostname:10000 |
|
|
|
Hiveserver2 Kerberos principal, for example hive/hostname@DOMAIN.COM |
|
|
|
A pattern of JDBC URL used for
connecting to Hiveserver2. Example:
|
|
|
|
An authorization token to Hive |
H2O client parameters |
|||
|
|
|
Directory where flows from H2O Flow are saved. |
|
|
|
IP of H2O client node. |
|
|
|
Location of iced directory for the driver instance. |
|
|
|
H2O internal log level used for H2O client running inside Spark driver. |
|
|
|
Location of H2O logs on the driver machine. |
|
|
|
Port on which H2O client publishes its API. If already occupied, the next odd port is tried on so on. |
|
|
|
Exact client port to access web UI.
The value |
|
|
|
The client outputs verbose log output
directly into console. Enabling the
flag increases the client log level to
|
|
|
|
Subnet selector for H2O client, this disables using IP reported by Spark but tries to find IP based on the specified mask. |
|
|
|
Allows to override the base URL address of Flow UI, including the scheme, which is showed to the user. |
|
|
|
Timeout in milliseconds specifying how often we check whether the the client is still connected. |
|
|
|
A string containing extra parameters passed to H2O client during startup. This parameter should be configured only if H2O parameters do not have any corresponding parameters in Sparkling Water. |
|
|
|
Whether certificates should be verified before using in H2O or not. |
Internal backend configuration properties¶
Property name |
Default value |
H2OConf setter (* getter) |
Description |
---|---|---|---|
Generic parameters |
|||
|
|
|
Expected number of workers of H2O cluster. Value None means automatic detection of cluster size. This number must be equal to number of Spark executors. |
|
|
|
Multiplication factor for dummy RDD
generation. Size of dummy RDD is
|
|
|
|
Number of retries for creation of an RDD spread across all existing Spark executors. |
|
|
|
Starting size of cluster in case that size is not explicitly configured. |
|
|
|
Subsequent successful tries to figure out size of Spark cluster, which are producing the same number of nodes. |
|
|
|
Either a string with the Path to a file with Hadoop HDFS configuration or the org.apache.hadoop.conf.Configuration object. Useful for HDFS credentials settings and other HDFS-related configurations. |
H2O nodes parameters |
|||
|
|
|
Location of iced directory for H2O nodes on the Spark executors. |
External backend configuration properties¶
Property name |
Default value |
H2OConf setter (* getter) |
Description |
---|---|---|---|
|
|
|
ip:port of arbitrary H2O node to identify external H2O cluster. |
|
|
|
Number of H2O nodes to start in
|
|
|
|
Timeout in seconds for starting H2O external cluster. |
|
|
|
Full path to a file which is used sd the notification file for the startup of external H2O cluster. |
|
|
|
Amount of memory assigned to each H2O node on YARN/Hadoop. |
|
|
|
Path to the directory on HDFS used for storing temporary files. |
|
|
|
If this option is set to |
|
|
|
Path to H2O driver used during
|
|
|
|
Yarn queue on which external H2O cluster is started. |
|
|
|
If true, the client will try to kill the cluster and then itself in case some nodes in the cluster report unhealthy status. |
|
|
|
Kerberos Principal. |
|
|
|
Kerberos Keytab. |
|
|
|
Impersonated Hadoop user. |
|
|
|
Ip address or network of mapper->driver callback interface. Default value means automatic detection. |
|
|
|
Port of mapper->driver callback interface. Default value means automatic detection. |
|
|
|
Range portX-portY of mapper->driver callback interface; eg: 50000-55000. |
|
|
|
This option is a percentage of
|
|
|
|
Timeout for confirmation from
worker nodes when stopping the
external backend. It is also
possible to pass |
|
|
|
Name or path to path to a hadoop executable binary which is used to start external H2O backend on YARN. |
|
|
|
Comma-separated paths to jars that will be placed onto classpath of each H2O node. |
|
|
|
The type of compression used for
data transfer between Spark and H2O
node. Possible values are |
H2OConf getter can be derived from the corresponding setter. All getters are parameter-less. If the type of the property is Boolean, the getter is prefixed with
is
(E.g. setReplEnabled()
-> isReplEnabled()
). Property getters of other types do not have any prefix and start with lowercase
(E.g. setUserName(String)
-> userName
for Scala, userName()
for Python).