Enabling SSL¶
Both Spark and H2O support basic node authentication and data encryption. In H2O’s case, we encrypt all the data sent between server nodes and between client and server nodes.
Currently only encryption based on Java’s key pair is supported (more in-depth explanation can be found in H2O’s documentation linked below).
To enable security for Spark methods, please review their Spark Security documentation.
Security for data exchanged between H2O instances can be enabled by generating all necessary files and distributing them to all worker nodes (as described in the H2O-3 documentation). Sparkling Water allows the user to use the manually created security files or it can generate it automatically.
Using Automatically Generated Security Files¶
To automatically generate and apply the security configuration. please set spark.ext.h2o.internal_secure_connections=true
option to the Spark submit:
bin/sparkling-shell --conf "spark.ext.h2o.internal_secure_connections=true"
This can be also achieved in programmatic way on the H2OConf
:
Scala
import org.apache.spark.h2o._
val conf = new H2OConf().setInternalSecureConnectionsEnabled()
val hc = H2OContext.getOrCreate(conf)
Python
from pysparkling import *
conf = H2OConf().setInternalSecureConnectionsEnabled()
hc = H2OContext.getOrCreate(conf)
R
library(rsparkling)
sc <- spark_connect(master = "local")
conf = H2OConf()$setInternalSecureConnectionsEnabled()
hc = H2OContext.getOrCreate(conf)
This method generates all files and distributes them via YARN or Spark methods to all worker nodes. This communication is secure in the case of configured YARN/Spark security.
Using Manually Generated Security Files¶
To use manually generated security files, please pass the following configuration to your Spark Submit:
bin/sparkling-shell --conf "spark.ext.h2o.internal_security_conf=ssl.properties"
This can be also achieved in programmatic way on the H2OConf
:
Scala
import org.apache.spark.h2o._
val conf = new H2OConf().setSslConf("/path/to/ssl/configuration")
val hc = H2OContext.getOrCreate(conf)
Python
from pysparkling import *
conf = H2OConf().setSslConf("/path/to/ssl/configuration")
hc = H2OContext.getOrCreate(conf)
R
library(rsparkling)
sc <- spark_connect(master = "local")
conf = H2OConf()$setSslConf("/path/to/ssl/configuration")
hc = H2OContext.getOrCreate(conf)
Format of the security configuration is explained at H2O-3 documentation.