Use Sparkling Water in Windows Environments¶
Windows environments require several additional steps to run Spark and Sparkling Water. A great summary of the configuration steps is available here.
To use Sparkling Water in Windows environments:
Download the appropriate Spark distribution from the Spark Downloads page.
Point the
SPARK_HOME
variable to the location of your Spark distribution:SET SPARK_HOME=<location of your downloaded Spark distribution>
From https://github.com/steveloughran/winutils, download
winutils.exe
for the Hadoop version that is referenced by your Spark distribution (For example, forspark-2.4.0-bin-hadoop2.7.tgz
, you needwintutils.exe
for Hadoop 2.7.)Move
winutils.exe
into a new directory%SPARK_HOME%\hadoop\bin
and set:SET HADOOP_HOME=%SPARK_HOME%\hadoop
Create a new file
%SPARK_HOME%\hadoop\conf\hive-site.xml
, which sets up a default Hive scratch directory. The best location is a writable temporary directory, for example%TEMP%\hive
:<configuration> <property> <name>hive.exec.scratchdir</name> <value>PUT HERE LOCATION OF TEMP FOLDER</value> <description>Scratch space for Hive jobs</description> </property> </configuration>
Note: You can also use the Hive default scratch directory, which is
c:\tmp\hive
. In this case, you need to create the directory manually and callwinutils.exe chmod -R 777 c:\tmp\hive
to set up the correct permissions.Set the
HADOOP_CONF_DIR
property:SET HADOOP_CONF_DIR=%SPARK_HOME%\hadoop\conf
Go to the Sparkling Water directory and run the Sparkling Water shell:
bin/sparkling-shell.cmd