Spark Frame <–> H2O Frame Conversions

Quick links:

Converting an H2OFrame into an RDD[T]

The H2OContext class provides the explicit conversion asRDD, which creates an RDD-like wrapper around the provided H2O’s H2OFrame:

def asRDD[A <: Product: TypeTag: ClassTag](fr : H2OFrame) : RDD[A]

The call expects the type A to create a correctly typed RDD. The conversion requires type A to be bound by the Product interface. The relationship between the columns of the H2OFrame and the attributes of class A is based on name matching.

Example

val df: H2OFrame = ...
val rdd = asRDD[Weather](df)

Converting an H2OFrame into a DataFrame

The H2OContext class provides the explicit conversion, asDataFrame, which creates a DataFrame-like wrapper around the provided H2O H2OFrame. Technically, it provides the RDD[sql.Row] RDD API:

def asDataFrame(fr : H2OFrame): DataFrame

This call does not require any type of parameters, but because it creates DataFrame instances, it requires access to an instance of SQLContext. In this case, the instance is provided as an implicit parameter of the call. The parameter can be passed in two ways: as an explicit parameter or by introducing an implicit variable into the current context.

The schema of the created instance of the DataFrame is derived from the column name and the type of H2OFrame specified.

Example

Using an explicit parameter in the call to pass sqlContext:

val sqlContext = new SQLContext(sc)
val schemaRDD = asDataFrame(h2oFrame)

or as an implicit variable provided by the actual environment:

implicit val sqlContext = new SQLContext(sc)
val schemaRDD = asDataFrame(h2oFrame)

Converting an RDD[T] into an H2OFrame

The H2OContext provides implicit conversion from the specified RDD[A] to H2OFrame. As with conversion in the opposite direction, the type A has to satisfy the upper bound expressed by the type Product. The conversion will create a new H2OFrame, transfer data from the specified RDD, and save it to the H2O K/V data store.

implicit def asH2OFrame[A <: Product : TypeTag](rdd : RDD[A]) : H2OFrame

The API also provides a explicit version, which allows for specifying the name for the resulting H2OFrame.

def asH2OFrame[A <: Product : TypeTag](rdd : RDD[A], frameName: Option[String]) : H2OFrame

Example

val rdd: RDD[Weather] = ...
import h2oContext.implicits._
// implicit call of H2OContext.asH2OFrame[Weather](rdd) is used
val hf: H2OFrame = rdd
// Explicit call of of H2OContext API with name for resulting H2O frame
val hfNamed: H2OFrame = h2oContext.asH2OFrame(rdd, Some("h2oframe"))

Converting a DataFrame into an H2OFrame

The H2OContext provides implicit conversion from the specified DataFrame to H2OFrame. The conversion will create a new H2OFrame, transfer data from the specified DataFrame, and save it to the H2O K/V data store.

implicit def asH2OFrame(rdd : DataFrame) : H2OFrame

The API also provides an explicit version, which allows for specifying the name for the resulting H2OFrame.

def asH2OFrame(rdd : DataFrame, frameName: Option[String]) : H2OFrame

Example

val df: DataFrame = ...
import h2oContext.implicits._
// Implicit call of H2OContext.asH2OFrame(srdd) is used
val hf: H2OFrame = df
// Explicit call of H2Context API with name for resulting H2O frame
val hfNamed: H2OFrame = h2oContext.asH2OFrame(df, Some("h2oframe"))