Using SSL to secure H2O Flow UI¶
Sparkling Water allows user to set https for communication with H2O Flow user interface. The security settings for FLOW UI are also applied to communication and data exchange between Spark instances (driver + executors) and H2O nodes.
There are two ways how to secure Flow UI
Provide an existing SSL certificate in Java key store to Sparkling Water
Let Sparkling Water automatically generate SSL certificate. This solution has several limitations which are described below.
Using existing Java Keystore¶
In order to use https correctly, the following two options need to be specified:
spark.ext.h2o.jks
- A Path to the Java keystore file containing a SSL certificatespark.ext.h2o.jks.pass
- A password to the Java keystore filespark.ext.h2o.jks.alias
- (Optional) Alias of the SSL certificate if the Java keystore file contains more than one certificate.
If the certificate doesn’t cover all hostnames of all H2O nodes and contains just hostname of Spark driver where H2O FLOW UI
lives, hostname verification on Spark instances (driver + executor) for connections to H2O nodes must be disabled by setting
the property spark.ext.h2o.internal.rest.verify_ssl_hostnames
to false
.
Scala
To enable https in Sparkling Water, you can start Sparkling Water as:
bin/sparkling-shell --conf "spark.ext.h2o.jks=/path/to/keystore" --conf "spark.ext.h2o.jks.pass=password"
and when you have the shell running, start H2OContext
as:
import ai.h2o.sparkling._
val hc = H2OContext.getOrCreate()
You can also start the Sparkling shell without the configuration and specify it using the setters on H2OConf
as:
import ai.h2o.sparkling._
val conf = new H2OConf().setJks("/path/to/keystore").setJksPass("password")
val hc = H2OContext.getOrCreate(conf)
Python
To enable https in PySparkling, you can start PySparkling as:
bin/pysparkling --conf "spark.ext.h2o.jks=/path/to/keystore" --conf "spark.ext.h2o.jks.pass=password"
and when you have the shell running, start H2OContext
as:
from pysparkling import *
hc = H2OContext.getOrCreate()
You can also start PySparkling shell without the configuration
and specify it using the setters on H2OConf
as:
from pysparkling import *
conf = H2OConf().setJks("/path/to/keystore").setJksPass("password)
hc = H2OContext.getOrCreate(conf)
R
To enable https in RSparkling, run in RStudio:
library(rsparkling)
sc <- spark_connect(master = "local")
conf <- H2OConf()$setJks("/path/to/keystore")$setJksPass("password")
hc <- H2OContext.getOrCreate(conf)
In case your certificates are self-signed or signed by an untrusted CA, the connection to the H2O cluster will fail due to the security limitations. In this case, you can skip the certificates verification as follows:
Scala
val conf = new H2OConf().setSslHostnameVerificationInInternalRestConnectionsDisabled()
val hc = H2OContext.getOrCreate(conf)
Python
conf = H2OConf()
conf.setSslHostnameVerificationInInternalRestConnectionsDisabled()
conf.setVerifySslCertificates(False)
hc = H2OContext.getOrCreate(conf)
R
conf <- H2OConf()
conf$setSslHostnameVerificationInInternalRestConnectionsDisabled()
conf$setVerifySslCertificates(FALSE)
hc <- H2OContext.getOrCreate(conf)
Generate the files automatically¶
Sparkling Water can generate the necessary key store and password automatically. To enable the automatic
generation, the spark.ext.h2o.auto.flow.ssl
option needs to be set to true
. In this mode only self-signed
certificates are created.
Scala
To enable the security using this mode in Sparkling Water, start Sparkling Shell as:
bin/sparkling-shell --conf "spark.ext.h2o.auto.flow.ssl=true"
and when you have the shell running, start H2OContext
as:
import ai.h2o.sparkling._
val hc = H2OContext.getOrCreate()
You can also start Sparkling shell without the configuration
and specify it using the setters on H2OConf
as:
import ai.h2o.sparkling._
val conf = new H2OConf().setAutoFlowSslEnabled()
val hc = H2OContext.getOrCreate(conf)
Python
To enable https in PySparkling using this mode, you can start PySparkling as:
bin/pysparkling --conf "spark.ext.h2o.auto.flow.ssl=true" --conf "spark.ext.h2o.verify_ssl_certificates=false"
and when you have the shell running, start H2OContext
as:
from pysparkling import *
hc = H2OContext.getOrCreate()
You can also start PySparkling shell without the configuration
and specify it using the setters on H2OConf
as:
from pysparkling import *
conf = H2OConf().setAutoFlowSslEnabled().setVerifySslCertificates(False)
hc = H2OContext.getOrCreate(conf)
R
To enable https in RSparkling using this mode, run in your RStudio:
library(rsparkling)
sc <- spark_connect(master = "local")
conf <- H2OConf()$setAutoFlowSslEnabled()$setVerifySslCertificates(FALSE)
hc <- H2OContext.getOrCreate(conf)