Both Spark and H2O support basic node authentication and data encryption. In H2O’s case, we encrypt all the data sent between server nodes and between client and server nodes.
Currently only encryption based on Java’s key pair is supported (more in-depth explanation can be found in H2O’s documentation linked below).
To enable security for Spark methods, please review their Spark Security documentation.
Security for data exchanged between H2O instances can be enabled manually by generating all necessary files and distributing them to all worker nodes (as described in the H2O-3 documentation) and then passing the
spark.ext.h2o.internal_security_conf to Spark submit:
bin/sparkling-shell --conf "spark.ext.h2o.internal_security_conf=ssl.properties"
We also provide utility methods that automatically generate all necessary files and enable security on all H2O nodes. This is done by passing the
spark.ext.h2o.internal_secure_connections=true option to the Spark submit:
bin/sparkling-shell --conf "spark.ext.h2o.internal_secure_connections=true"
This can be also achieved in programmatic way in Scala using the utility class
import org.apache.spark.network.Security import org.apache.spark.h2o._ Security.enableSSL(spark) // generate properties file, key pairs and set appropriate H2O parameters val hc = H2OContext.getOrCreate(spark) // start the H2O cluster
Or if you plan on passing your own H2OConf, then please use:
import org.apache.spark.network.Security import org.apache.spark.h2o._ val conf: H2OConf = new H2OConf(spark) Security.enableSSL(spark, conf) // generate properties file, key pairs and set appropriate H2O parameters val hc = H2OContext.getOrCreate(spark, conf) // start the H2O cluster
This method generates all files and distributes them via YARN or Spark methods to all worker nodes. This communication is secure in the case of configured YARN/Spark security.