Running Sparkling Water on Kerberized Hadoop Cluster¶
Sparkling Water can run on kerberized Hadoop cluster and also supports Kerberos authentification for clients and Flow access. This tutorial shows how to configure Sparkling Water to run on kerberized Hadoop cluster. If you are also interested in using Kerberos authentification, please read Enabling Kerberos Authentication.
Sparkling Water supports the kerberized cluster in both internal and external backend.
Internal Backend¶
To make Sparkling Water aware of the Kerberized cluster, you can call:
bin/sparkling-shell --conf "spark.yarn.principal=PRINCIPAL" --conf "spark.yarn.keytab=/path/to/keytab"
or you can create the Kerberos ticket in before hand using kinit
and call just
./bin/sparkling-shell
In this case Sparking Water will use the created ticket and we don’t need to pass the configuration details.
External Backend¶
In External Backend, we are also starting H2O cluster on YARN and we need to make sure it is secured as well.
You can start Sparkling Water as:
bin/sparkling-shell --conf "spark.yarn.principal=PRINCIPAL" --conf "spark.yarn.keytab=/path/to/keytab"
In this case, the value of spark.yarn.principal
and spark.yarn.keytab
properties will be also used to set
spark.ext.h2o.external.kerberos.principal
and spark.ext.h2o.external.kerberos.keytab
correspondigly. These options
are used to set up Kerberos on H2O external cluster via Sparkling Water.
You can also set the spark.ext.h2o.external.kerberos.principal
and spark.ext.h2o.external.kerberos.keytab
options directly.
The simplest option you can also start Sparkling Water is:
./bin/sparkling-shell
In this case we assume that the ticket has been created using kinit
and it will be used for both Spark and external
H2O cluster.
The same configuration is valid also for PySparkling and RSparkling.