.. _rsparkling_azure:

Running RSparkling on Databricks Azure Cluster
----------------------------------------------

Sparkling Water, PySparkling and RSparkling can be used on top of Databricks Azure Cluster. This tutorial is
the **RSparkling**.

For Scala Sparkling Water, please visit :ref:`sw_azure` and
for PySparkling, please visit :ref:`pysparkling_azure`.

To start Sparkling Water ``H2OContext`` on Databricks Azure, the steps are:

1.  Login into Microsoft Azure Portal

2.  Create Databricks Azure Environment

    In order to connect to Databricks from Azure, please make sure you have created user inside Azure Active Directory and using that user for the Databricks Login.

3.  Create the cluster

    - For Sparkling Water SUBST_SW_VERSION select Spark SUBST_SPARK_VERSION

    It is advised to always use the latest Sparkling Water and Spark version for the given Spark major version.

    .. figure:: ../images/databricks_cluster_creation.png
        :alt: Configured cluster ready to be started

4.  Create R notebook and attach it to the created cluster. To start ``H2OContext``, the init part of the notebook should be:

    .. code:: R

        # Install Sparklyr
        install.packages("sparklyr")

        # Install RSparkling SUBST_SPARK_MAJOR_VERSION.SUBST_SW_MINOR_VERSION
        install.packages("rsparkling", type = "source", repos = "http://h2o-release.s3.amazonaws.com/sparkling-water/rel-SUBST_SPARK_MAJOR_VERSION/SUBST_SW_MINOR_VERSION/R")

        # Install H2O SUBST_H2O_VERSION (SUBST_H2O_RELEASE_NAME)
        install.packages("h2o", type = "source", repos = "http://h2o-release.s3.amazonaws.com/h2o/rel-SUBST_H2O_RELEASE_NAME/SUBST_H2O_BUILD_NUMBER/R")

        # Connect to Spark on Databricks
        library(rsparkling)
        library(sparklyr)
        sc <- spark_connect(method = "databricks")

        # Start H2O context
        h2o_context(sc)

6.  And voila, we should have ``H2OContext`` running

    .. figure:: ../images/databricks_rsparkling_h2o_context_running.png
        :alt: Running H2O Context

    Please note that accessing H2O Flow is not currently supported on Azure Databricks.