Sparkling Water enables transformation between different types of Spark’s
RDD and H2O’s
H2OFrame, and vice versa.
When converting from
RDD, a wrapper is created around the
H2OFrame to provide an RDD-like API. In this case, no data is duplicated; instead, the data is served directly from the underlying
Conversion in the opposite direction (i.e, from Spark
H2OFrame) requires evaluation of the data stored in the Spark
RDD and then transferring that from RDD storage into
H2OFrame. However, data stored in
H2OFrame is heavily compressed.
Exchanging the Data¶
The way that data is transferred between Spark and H2O differs based on the used Sparkling Water backend. (Refer to Sparkling Water Backends for more information about the Internal and External backends.)
In the Internal Sparkling Water Backend, Spark and H2O share the same JVM, as is depicted on the following figure.
In the External Sparkling Water Backend, Spark and H2O are separated clusters, and data has to be sent between those clusters over the network.