Use RSparkling in Windows Environments

Prepare Spark Environment

Initially, please follow the tutorial of running Use Sparkling Water in Windows Environments. The configurations applies to RSparkling as well.

Prepare R Environment

Please follow the RSparkling Documentation to properly set up R packages and environment.

Test the Functionality

Use the following script below to test if you have any RSparkling issues.

The script will check that you can:

  1. Connect to Spark

  2. Start H2O

  3. Copy a R dataframe from R to a Spark DataFrame.


# Set spark connection
sc <- spark_connect(master = "local", version = "3.2.4")

# Create H2O Context

# Copy R dataset to Spark
mtcars_tbl <- copy_to(sc, mtcars, overwrite = TRUE)


  • Error from running h2o_context

    Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 2.0 failed 1 times, most recent failure: Lost task 3.0 in stage 2.0 (TID 13, localhost): java.lang.NullPointerException
           at java.lang.ProcessBuilder.start(
           at org.apache.hadoop.util.Shell.runCommand(
           at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(
           at org.apache.hadoop.fs.FileUtil.chmod(
           at org.apache.hadoop.fs.FileUtil.chmod(
           at org.apache.spark.util.Utils$.fetchFile(Utils.scala:471)

    This is caused because HADOOP_HOME environment variable is not explicitly set. Set the HADOOP_HOME environment to %SPARK_HOME%/tmp/hadoop or location where bin\winutils.exe is located.

    Download winutils.exe binary from repository.

    NOTE: You need to select the correct Hadoop version which is compatible with your Spark distribution. Hadoop version is often encoded in spark download name, for example, spark-3.2.4-bin-hadoop2.7.tgz.

  • Error from running copy_to

    Error: java.lang.reflect.InvocationTargetException
            at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
            at sun.reflect.NativeConstructorAccessorImpl.newInstance(
            at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
            at java.lang.reflect.Constructor.newInstance(
            at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258)
            at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359)
            at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263)
            at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)

    This is caused because there are no permissions to the folder: file:///tmp/hive. You can run a command in the command prompt which will change the permissions of the /tmp/hive directory. It will change the permissions of the /tmp/hive directory so that all three users (Owner, Group, and Public) can Read, Write, and Execute.

    To change the permissions, go to the command prompt and write: \path\to\winutils\Winutils.exe chmod 777 \tmp\hive

    You can also create a file hive-site.xml in %HADOOP_HOME%\conf and modify the location of default Hive scratch dir (which is /tmp/hive):

        <description>Scratch space for Hive jobs</description>

    In this case, do not forget to set the variable HADOOP_CONF_DIR:


    If the previous does not work, you can delete the metastore_db folder in your R working directory.