Security¶
H2O contains security features intended for deployment inside a secure data center.
Security Model¶
Below is a discussion of what the security assumptions are, and what the H2O software does and does not do.
Terms¶
Term |
Definition |
---|---|
H2O Cluster |
A collection of H2O nodes that work together. In the H2O Flow Web UI, the cluster status menu item shows the list of nodes in an H2O cluster. |
H2O node |
One VM instance running the H2O main class. One H2O node corresponds to one OS-level process. In the YARN case, one H2O node corresponds to one mapper instance and one YARN container. |
H2O embedded web port |
Each H2O node contains an embedded web port (by default port 54321). This web port hosts H2O Flow as well as the H2O REST API. The user interacts directly with this web port. |
H2O Internal communication port |
Each H2O node also has an internal port (web port+1, so by default port 54322) for internal node-to-node communication. This is a proprietary binary protocol. An attacker using a tool like tcpdump or wireshark may be able to reverse engineer data captured on this communication path. |
Assumptions (Threat Model)¶
H2O lives in a secure data center.
Denial of service is not a concern.
H2O is not designed to withstand a DOS attack.
HTTP traffic between the user client and H2O cluster needs to be encrypted.
This is true for both interactive sessions (e.g the H2O Flow Web UI) and programmatic sessions (e.g. an R program).
Man-in-the-middle attacks are of low concern.
Certificate checking on the client side for R/python is not yet implemented.
You may want to secure internal binary H2O node-to-H2O node traffic via encryption.
You trust the person that starts H2O to start it correctly.
Enabling H2O security requires specifying the correct security options.
User client sessions do not need to expire. A session lives at most as long as the cluster lifetime. H2O clusters are started and stopped “frequently enough”.
All data is stored in-memory, so restarting the H2O cluster wipes all data from memory, and there is nothing to clean from disk.
Once a user is authenticated for access to H2O, they have full access.
H2O supports authentication but not authorization or access control (ACLs).
H2O clusters are meant to be accessed by only one user.
Each user starts their own H2O cluster.
H2O only allows access to the embedded web port to the person that started the cluster.
Data Chain-of-Custody in a Hadoop Data Center Environment¶
Note: This holds true for all versions of Hadoop (including YARN) supported by H2O.
Through this sequence, it is shown that a user is only able to access the same data from H2O that they could already access from normal Hadoop jobs.
Data lives in HDFS
The files in HDFS have permissions
An HDFS user has permissions (capabilities) to access certain files
Kerberos (kinit) can be used to authenticate a user in a Hadoop environment
A user’s Hadoop MapReduce job inherits the permissions (capabilities) of the user, as well as kinit metadata
H2O is a Hadoop MapReduce job
H2O can only access the files in HDFS that the user has permission to access
Only the user that started the cluster is authenticated for access to the H2O cluster
The authenticated user can access the same data in H2O that he could access via HDFS
What is Being Secured Today¶
Standard file permissions security is provided by the Operating System and by HDFS.
The embedded web port in each node of H2O can be secured in two ways:
Method
Description
HTTPS
Encrypted socket communication between the user client and the embedded H2O web port.
Authentication
An HTTP Basic Auth username and password from the user client.
Note: Embedded web port HTTPS and authentication may be used separately or together.
Internal H2O node-to-H2O node communication can be encrypted.
Enforcing System-Level Command-Line Arguments in h2odriver.jar¶
System administrators can create a configuration file with implicit arguments of h2odriver and use it to make sure the H2O cluster is started with the specified security settings.
Create the config file in /etc/h2o/h2odriver.args.
Specify the default command-line options that you want to enforce. Note that each argument must be on a separate line. For example:
h2o_ssl_jks_internal=keystore.jks h2o_ssl_jks_password=password h2o_ssl_jts_internal=truststore.jks h2o_ssl_jts_password=password
Start H2O.
hadoop jar h2odriver.jar -mapperXmx 3g -nodes 1
File Security in H2O¶
H2O is a normal user program. Nothing specifically needs to be done by the user to get file security for H2O. Operating System and HDFS permissions “just work”.
Standalone H2O¶
Since H2O is a regular Java program, the files H2O can access are restricted by the user’s Operating System permissions (capabilities).
H2O on Hadoop¶
Since H2O is a regular Hadoop MapReduce program, the files H2O can access are restricted by the standard HDFS permissions of the user that starts H2O.
Since H2O is a regular Hadoop MapReduce program, Kerberos (kinit) works seamlessly. (No code was added to H2O to support Kerberos.)
Sparkling Water on YARN¶
Similar to H2O on Hadoop, this configuration is H2O on Spark on YARN. The YARN job inherits the HDFS permissions of the user.
Embedded Web Port (by default port 54321) Security¶
For the client side, connection options exist.
For the server side, startup options exist to facilitate security. These are detailed below.
HTTPS¶
HTTPS Client Side¶
Flow Web UI Client¶
When HTTPS is enabled on the server side, the user must provide the https URI scheme to the browser. No http access will exist.
R Client¶
The following code snippet demonstrates connecting to an H2O cluster with HTTPS:
h2o.init(ip = "a.b.c.d", port = 54321, https = TRUE, insecure = FALSE)
The underlying HTTPS implementation is provided by RCurl and by extension libcurl and OpenSSL.
Python Client¶
The following code snippet demonstrates connecting to an H2O cluster with HTTPS:
h2o.init(ip="a.b.c.d", port=54321, https=True, insecure=False)
The underlying HTTPS implementation is provided by RCurl and by extension libcurl and OpenSSL.
HTTPS Server Side¶
A Java Keystore must be provided on the server side to enable HTTPS. Keystores can be manipulated on the command line with the keytool command.
The underlying HTTPS implementation is provided by Jetty 9 and the Java runtime.
Standalone H2O¶
The following options are available:
-jks <filename>
Java keystore file
-jks_pass <password>
(Default is 'h2oh2o')
-jks_alias <alias>
(Optional) Which certificate from the keystore to use
Example:
java -jar h2o.jar -jks h2o.jks
H2O on Hadoop¶
The following options are available:
-jks <filename>
Java keystore file
-jks_pass <password>
(Default is 'h2oh2o')
-jks_alias <alias>
(Optional) Which certificate from the keystore to use
Example:
hadoop jar h2odriver.jar -n 3 -mapperXmx 10g -jks h2o.jks -output hdfsOutputDirectory
Sparkling Water¶
The following Spark conf properties exist for Java Keystore configuration:
Spark conf property |
Description |
---|---|
spark.ext.h2o.jks |
Path to Java Keystore |
spark.ext.h2o.jks.pass |
JKS password |
Example:
$SPARK_HOME/bin/spark-submit --class water.SparklingWaterDriver --conf spark.ext.h2o.jks=/path/to/h2o.jks sparkling-water-assembly-0.2.17-SNAPSHOT-all.jar
Creating your own self-signed Java Keystore¶
Here is an example of how to create your own self-signed Java Keystore (mykeystore.jks) with a custom keystore password (mypass) and how to run standalone H2O using your Keystore:
# Be paranoid and delete any previously existing keystore.
rm -f mykeystore.jks
# Generate a new keystore.
keytool -genkey -keyalg RSA -keystore mykeystore.jks -storepass mypass -keysize 2048
What is your first and last name?
[Unknown]:
What is the name of your organizational unit?
[Unknown]:
What is the name of your organization?
[Unknown]:
What is the name of your City or Locality?
[Unknown]:
What is the name of your State or Province?
[Unknown]:
What is the two-letter country code for this unit?
[Unknown]:
Is CN=Unknown, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, C=Unknown correct?
[no]: yes
Enter key password for <mykey>
(RETURN if same as keystore password):
# Run H2O using the newly generated self-signed keystore.
java -jar h2o.jar -jks mykeystore.jks -jks_pass mypass
Kerberos Authentication (via HTTP Basic)¶
Kerberos H2O Client Side¶
Flow Web UI Client¶
When authentication is enabled, the user will be presented with a username and password dialog box when attempting to reach Flow.
R Client¶
The following code snippet demonstrates connecting to an H2O cluster with authentication:
h2o.init(ip = "a.b.c.d", port = 54321, username = "myusername", password = "mypassword")
Python Client¶
For Python, connecting to H2O with authentication is similar:
h2o.init(ip="a.b.c.d", port=54321, username="myusername", password="mypassword")
Kerberos H2O Server Side¶
You must provide a simple configuration file that specifies the Kerberos login module
Example kerb.conf:
krb5loginmodule {
com.sun.security.auth.module.Krb5LoginModule required
};
If the default realm and/or KDC cannot be automatically detected (e.g. by resolving KDC using DNS) you might need to
specify additional system properties java.security.krb5.realm
and/or java.security.krb5.kdc
when starting H2O
(see example in the Standalone H2O section).
For more detail about Kerberos configuration: Krb5LoginModule, Jaas note
Standalone H2O¶
The following options are required for Kerberos authentication:
-kerberos_login
Use Jetty KerberosLoginService
-login_conf <filename>
LoginService configuration file
-user_name <username>
Override name of user for which access is allowed
Example:
java -jar h2o.jar -kerberos_login -login_conf kerb.conf -user_name kerb_principal
Example (with realm and KDC explicitly specified):
java -Djava.security.krb5.realm="0XDATA.LOC" -Djava.security.krb5.kdc="ldap.0xdata.loc" -jar h2o.jar -kerberos_login -login_conf kerb.conf -user_name kerb_principal
H2O on Hadoop¶
The following options are available:
-kerberos_login
Use Jetty KerberosLoginService
-login_conf <filename>
LoginService configuration file
-user_name <username>
Override name of user for which access is allowed
Example:
hadoop jar h2odriver.jar -n 3 -mapperXmx 10g -kerberos_login -login_conf kerb.conf -output hdfsOutputDirectory -user_name kerb_principal
Sparkling Water¶
The following Spark conf properties exist for Kerberos configuration:
Spark conf property |
Description |
---|---|
spark.ext.h2o.kerberos.login |
Use Jetty Krb5LoginModule |
spark.ext.h2o.login.conf |
LoginService configuration file |
spark.ext.h2o.user.name |
Name of user for which access is allowed |
Example:
$SPARK_HOME/bin/spark-submit --class water.SparklingWaterDriver --conf spark.ext.h2o.kerberos.login=true --conf spark.ext.h2o.user.name=kerb_principal --conf spark.ext.h2o.login.conf=kerb.conf sparkling-water-assembly-0.2.17-SNAPSHOT-all.jar
Kerberos Authentication (via kinit/SPNEGO)¶
Kerberos H2O Client Side¶
Flow Web UI Client¶
Modern browsers support kerberos authentication out of the box. When attempting to reach Flow the server will respond with 401 with negotiate header and the browser will use last key acquired via kinit on the client machine.
R Client¶
The following code snippet demonstrates connecting to an H2O cluster with SPNEGO authentication:
h2o.init(ip = "a.b.c.d", port = 54321, use_spnego = TRUE)
Limitation: The R client uses the RCurl library, which does not allow you to specify service principal and is limited to automatic service principal generation via the template http/HOSTNAME@DOMAIN.
Python Client¶
For Python, connecting to H2O with authentication is similar:
from h2o.auth import SpnegoAuth
h2o.connect(ip="a.b.c.d", port=54321, auth=SpnegoAuth(service_principal="HTTP/h2o_server@EXAMPLE.COM"))
Limitation: Connecting to a SPNEGO-configured H2O server is currently possible only via h2o.connect
. (h2o.init
not supported). The next section describes how to specify service_principal
.
Kerberos H2O Server Side¶
On the machine running the H2O server a keytab file must be created containing the key for the service principal used by this server. The same service principal must be used in the client code when connecting to the server.
You must provide configuration files for the SPNEGO login module:
Example spnego.conf:
com.sun.security.jgss.initiate {
com.sun.security.auth.module.Krb5LoginModule required
principal="HTTP/h2o_server@EXAMPLE.COM"
keyTab="/srv/h2o.keytab"
useKeyTab=true
storeKey=true
isInitiator=false;
};
com.sun.security.jgss.accept {
com.sun.security.auth.module.Krb5LoginModule required
principal="HTTP/h2o_server@EXAMPLE.COM"
keyTab="/srv/h2o.keytab"
useKeyTab=true
storeKey=true
isInitiator=false;
};
Example spnego.properties:
targetName=HTTP/h2o_server@EXAMPLE.COM
Standalone H2O¶
The following options are required for SPNEGO authentication:
-spnego_login
Use Jetty SPNEGO Login Service
-user_name <username>
Principal for which access is allowed, must be full kerberos name name/path@DOMAIN
-login_conf <filename>
path to spnego.conf file, see example above
-spnego_properties <filename>
path to spnego.properties file, see example above
Example:
java -jar h2o.jar \
-spnego_login -user_name pricipal@DOMAIN \
-login_conf /path/to/spnego.conf \
-spnego_properties /path/to/spnego.properties
H2O on Hadoop¶
The following options are available:
-spnego_login
Use Jetty SPNEGO Login Service
-user_name <username>
Principal for which access is allowed, must be full kerberos name name/path@DOMAIN
-login_conf <filename>
path to spnego.conf file, see example above
-spnego_properties <filename>
path to spnego.properties file, see example above
Example:
hadoop jar h2odriver.jar -n 3 -mapperXmx 10g -output hdfsOutputDirectory \
-proxy -spnego_login -user_name pricipal@DOMAIN \
-login_conf /path/to/spnego.conf \
-spnego_properties /path/to/spnego.properties
Limitation: Because a Kerberos service principal is tied to a hostname, we recommend that you use SPNEGO authentication only with the -proxy
option.
LDAP Authentication¶
H2O client and server side configuration for LDAP is discussed below. Authentication is implemented using Basic Auth.
LDAP H2O Client Side¶
Flow Web UI Client¶
When authentication is enabled, the user will be presented with a username and password dialog box when attempting to reach Flow.
R Client¶
The following code snippet demonstrates connecting to an H2O cluster with authentication:
h2o.init(ip = "a.b.c.d", port = 54321, username = "myusername", password = "mypassword")
Python Client¶
The following code snippet demonstrates connecting to an H2O cluster with authentication:
h2o.init(ip="a.b.c.d", port=54321, username="myusername", password="mypassword")
LDAP H2O Server Side¶
An ldap.conf configuration file must be provided by the user. As an example, this file works for H2O’s internal LDAP server. You will certainly need help from your IT security folks to adjust this configuration file for your environment.
Example ldap.conf:
ldaploginmodule {
ai.h2o.org.eclipse.jetty.plus.jaas.spi.LdapLoginModule required
debug="true"
useLdaps="false"
contextFactory="com.sun.jndi.ldap.LdapCtxFactory"
hostname="ldap.0xdata.loc"
port="389"
bindDn="cn=admin,dc=0xdata,dc=loc"
bindPassword="0xdata"
authenticationMethod="simple"
forceBindingLogin="true"
userBaseDn="ou=users,dc=0xdata,dc=loc";
};
Standalone H2O¶
The following options are available:
-ldap_login
Use Jetty LdapLoginService
-login_conf <filename>
LoginService configuration file
-user_name <username>
Override name of user for which access is allowed
Example:
java -jar h2o.jar -ldap_login -login_conf ldap.conf
java -jar h2o.jar -ldap_login -login_conf ldap.conf -user_name myLDAPusername
H2O on Hadoop¶
The following options are available:
-ldap_login
Use Jetty LdapLoginService
-login_conf <filename>
LoginService configuration file
-user_name <username>
Override name of user for which access is allowed
Example:
hadoop jar h2odriver.jar -n 3 -mapperXmx 10g -ldap_login -login_conf ldap.conf -output hdfsOutputDirectory
hadoop jar h2odriver.jar -n 3 -mapperXmx 10g -ldap_login -login_conf ldap.conf -user_name myLDAPusername -output hdfsOutputDirectory
Sparkling Water¶
The following Spark conf properties exist for Java keystore configuration:
Spark conf property |
Description |
---|---|
spark.ext.h2o.ldap.login |
Use Jetty LdapLoginService |
spark.ext.h2o.login.conf |
LoginService configuration file |
spark.ext.h2o.user.name |
Override name of user for which access is allowed |
Example:
$SPARK_HOME/bin/spark-submit --class water.SparklingWaterDriver --conf spark.ext.h2o.ldap.login=true --conf spark.ext.h2o.login.conf=/path/to/ldap.conf sparkling-water-assembly-0.2.17-SNAPSHOT-all.jar
$SPARK_HOME/bin/spark-submit --class water.SparklingWaterDriver --conf spark.ext.h2o.ldap.login=true --conf spark.ext.h2o.user.name=myLDAPusername --conf spark.ext.h2o.login.conf=/path/to/ldap.conf sparkling-water-assembly-0.2.17-SNAPSHOT-all.jar
LDAP Authentication and MapR¶
The following information is for users who authentication with LDAP on MapR, which uses a proprietary Hadoop configuration property that specifies the configuration file. Additional information is available here: http://doc.mapr.com/display/MapR/mapr.login.conf.
In order to make LDAP authentication work, add the ldap.conf definition to the MapR configuration file in /opt/mapr/conf/mapr.login.conf.
Debugging Server-side LDAP issues¶
To get detailed output from Jetty for LDAP debugging, you need to create the jetty-logging.properties file and add it to your classpath.
Example jetty-logging.properties:
org.eclipse.jetty.util.log.class=org.eclipse.jetty.util.log.StdErrLog
org.eclipse.jetty.LEVEL=DEBUG
Standalone H2O example (with jetty-logging.properties in the current directory):
java -cp h2o.jar:. water.H2OApp
H2O on Hadoop example (with jetty-logging.properties in the current directory):
hadoop jar h2odriver.jar -libjars jetty-logging.properties -n 1 -mapperXmx 5g -output hdfsOutputDirectory
Pluggable Authentication Module (PAM) Authentication¶
This section describes H2O client and server side configuration for PAM authentication.
PAM H2O Client Side¶
Flow UI Client¶
When PAM authentication is enabled, the user will be presented with a username and password dialog box when attempting to reach Flow.
R Client¶
The following code snippet demonstrates connecting to an H2O cluster with authentication:
h2o.init(ip = "a.b.c.d", port = 54321, username = "myusername", password = "mypassword")
Python Client¶
For Python, connecting to H2O with authentication is similar:
h2o.init(ip="a.b.c.d", port=54321, username="myusername", password="mypassword")
PAM H2O Server Side¶
You must provide a simple configuration file that specifies the PAM login module.
Example pam.conf
pamloginmodule {
de.codedo.jaas.PamLoginModule required
service = h2o;
};
Note that the name of the service is user configurable, and this name must match the name of the PAM authentication module that you created for the “h2o service”.
Standalone H2O¶
The following options are required for PAM authentication:
-pam_login
Use PAM LoginService
-login_conf <filename>
LoginService configuration file
-user_name <username>
Override name of user for which access is allowed
-form_auth
Optionally enable form-based authentication for Flow
-session_timeout
If form_auth is enabled, optionally specify the number of minutes
that a session can remain idle before the server invalidates the
session and requests a new login
Example
java -jar h2o.jar -pam_login -login_conf pam.conf -user_name
H2O on Hadoop¶
The following options are available:
-pam_login
Use PAM LoginService
-login_conf <filename>
LoginService configuration file
-user_name <username>
Override name of user for which access is allowed
-form_auth
Optionally enable form-based authentication for Flow
-session_timeout
If form_auth is enabled, optionally specify the number of minutes
that a session can remain idle before the server invalidates the
session and requests a new login
Example
hadoop jar h2odriver.jar -n 3 -mapperXmx 10g -pam_login -login_conf pam.conf -output hdfsOutputDirectory -user_name
Hash File Authentication¶
H2O client and server side configuration for a hardcoded hash file is discussed below. Authentication is implemented using Basic Auth.
Hash File H2O Client Side¶
Flow Web UI Client¶
When authentication is enabled, the user will be presented with a username and password dialog box when attempting to reach Flow.
R Client¶
The following code snippet demonstrates connecting to an H2O cluster with authentication:
h2o.init(ip = "a.b.c.d", port = 54321, username = "myusername", password = "mypassword")
Python Client¶
The following code snippet demonstrates connecting to an H2O cluster with authentication:
h2o.init(ip="a.b.c.d", port=54321, username="myusername", password="mypassword")
Hash File H2O Server Side¶
A realm.properties configuration file must be provided by the user.
Example realm.properties:
# See https://wiki.eclipse.org/Jetty/Howto/Secure_Passwords
# java -cp h2o.jar org.eclipse.jetty.util.security.Password
username1: password1
username2: MD5:6cb75f652a9b52798eb6cf2201057c73
Generate secure passwords using the Jetty secure password generation tool:
java -cp h2o.jar org.eclipse.jetty.util.security.Password username password
See the Jetty 9 HashLoginService documentation and Jetty 9 Secure Password HOWTO for more information.
Standalone H2O¶
The following options are available:
-hash_login
Use Jetty HashLoginService
-login_conf <filename>
LoginService configuration file
Example:
java -jar h2o.jar -hash_login -login_conf realm.properties
H2O on Hadoop¶
The following options are available:
-hash_login
Use Jetty HashLoginService
-login_conf <filename>
LoginService configuration file
Example:
hadoop jar h2odriver.jar -n 3 -mapperXmx 10g -hash_login -login_conf realm.propertes -output hdfsOutputDirectory
Sparkling Water¶
The following Spark conf properties exist for hash login service configuration:
Spark conf property |
Description |
---|---|
spark.ext.h2o.hash.login |
Use Jetty HashLoginService |
spark.ext.h2o.login.conf |
LoginService configuration file |
Example:
$SPARK_HOME/bin/spark-submit --class water.SparklingWaterDriver --conf spark.ext.h2o.hash.login=true --conf spark.ext.h2o.login.conf=/path/to/realm.properties sparkling-water-assembly-0.2.17-SNAPSHOT-all.jar
SSL Internode Security¶
By default, communication between H2O nodes is not encrypted for performance reasons. H2O currently support SSL/TLS authentication (basic handshake authentication) and data encryption for internode communication.
Usage¶
Hadoop¶
The easiest way to enable SSL while running H2O via h2odriver is to pass the -internal_secure_connections
flag. This will tell h2odriver to automatically generate all the necessary files and distribute them to all mappers. This distribution may be secure depending on your YARN configuration.
hadoop jar h2odriver.jar -nodes 4 -mapperXmx 6g -output hdfsOutputDirName -internal_secure_connections
The user can also manually generate keystore/truststore and properties file as described in the Standalone/AWS section that follows and run the following command to use them instead. In this case, all the files (certificates and properties) have to be distributed to all the mapper nodes by the user.
hadoop jar h2odriver.jar -nodes 4 -mapperXmx 6g -output hdfsOutputDirName -internal_security_conf security.properties
Standalone/AWS¶
In this case, the user has to generate the keystores, truststores, and properties file manually.
Generate public/private keys and distributed them. (Refer to the Keystore/Truststore Generation section for more information).
Create the security properties file. (Refer to the Configuration section for a full list of parameters.)
h2o_ssl_jks_internal=keystore.jks h2o_ssl_jks_password=password h2o_ssl_jts_internal=truststore.jks h2o_ssl_jts_password=password
To start an SSL-enabled node, pass the location to the properties file using the
-internal_security_conf
flag
java -jar h2o.jar -internal_security_conf security.properties
Configuration¶
To enable this feature, set the -internal_security_conf
parameter when starting an H2O node, and point that to a configuration file (key=value format) that contains the following values:
h2o_ssl_jks_internal
(required): The path (absolute or relative) to the key-store file used for internal SSL communicationh2o_ssl_jks_password
(required): The password for the internal key-storeh2o_ssl_jts_internal
(optional): The path (absolute or relative) to the trust-store file used for internal SSL communication. If not present, thenh2o_ssl_jks_internal
will be used.h2o_ssl_jts_password
(optional): The password to the internal trust-store. If not present, thenh2o_ssl_jks_password
will be used.h2o_ssl_protocol
(optional): The protocol name used during encrypted communication (supported by JVM). This defaults to TSLv1.2.h2o_ssl_enabled_algorithms
(optional): A comma separated list of enabled cipher algorithms. Include only those that are supported by JVM.
This must be set for every node in the cluster. Every node needs to have access to both Java keystore and Java truststore containing appropriate keys and certificates.
Keystore/Truststore Generation¶
Keystore/truststore creation and distribution are deployment specific and have to be handled by the end user.
Basic keystore/truststore generation can be done using the keytool program, which ships with Java, documentation can be found here. Each node should have a key pair generated, and all public keys should be imported into a single truststore, which should be distributed to all the nodes.
The simplest (though not recommended) way would be to call:
keytool -genkeypair -keystore h2o-internal.jks -alias h2o-internal
Then distribute the h2o-internal.jks
file to all the nodes, and set it as both the keystore and truststore in ssl.config
.
A more secure way would be to:
Run the same command on each node:
keytool -genkeypair -keystore h2o-internal.jks -alias h2o-internal
Extract the certificate on each node:
keytool -export -keystore h2o-internal.jks -alias h2o-internal -file node<number>.cer
Distribute all of the above certificates to each node, and on each node create a truststore containing all of them (or put all certificates on one node, import to truststore and distribute that truststore to each node):
keytool -importcert -file node<number>.cer -keystore truststore.jks -alias node<number>
Performance¶
Turning on SSL may result in performance overhead for settings and algorithms that exchange data between nodes due to encryption/decryption time. Some algorithms might also slower because of this.
Example benchmark on a 5 node cluster (6GB memory per node) working with a 5.8mln row dataset (580MB):
Non SSL |
SSL |
|
---|---|---|
Parsing: |
4.908s |
5.304s |
GLM model: |
01:39.446 |
01:49.634 |
Caveats and Missing Pieces¶
Should you start a mixed cloud of SSL and nonSSL nodes, the SSL ones will fail to bootstrap, while the nonSSL ones will become unresponsive.
H2O does not provide in-memory data encryption. This might spill data to disk in unencrypted form should swaps to disk occur. As a workaround, an encrypted drive is advised.
H2O does not support encryption of data saved to disk, should appropriate flags be enabled. Similar to the previous caveat, the user can use an encrypted drive to work around this issue.
H2O supports only SSL and does not support SASL.