Spark Frame <–> H2O Frame Conversions¶
Quick links:
- Converting an H2OFrame into an RDD[T]
- Converting an H2OFrame into a DataFrame
- Converting an RDD[T] into an H2OFrame
- Converting a DataFrame into an H2OFrame
Converting an H2OFrame into an RDD[T]¶
The H2OContext
class provides the explicit conversion asRDD
, which creates an RDD-like wrapper around the provided H2O’s H2OFrame:
def asRDD[A <: Product: TypeTag: ClassTag](fr : H2OFrame) : RDD[A]
The call expects the type A
to create a correctly typed RDD. The conversion requires type A
to be bound by the Product
interface. The relationship between the columns of the H2OFrame and the attributes of class A
is based on name matching.
Example¶
val df: H2OFrame = ...
val rdd = asRDD[Weather](df)
Converting an H2OFrame into a DataFrame¶
The H2OContext
class provides the explicit conversion, asDataFrame
, which creates a DataFrame-like wrapper around the provided H2O H2OFrame. Technically, it provides the RDD[sql.Row]
RDD API:
def asDataFrame(fr : H2OFrame): DataFrame
This call does not require any type of parameters, but because it creates DataFrame
instances, it requires access to an instance of SQLContext
. In this case, the instance is provided as an implicit parameter of the call. The parameter can be passed in two ways: as an explicit parameter or by introducing an implicit variable into the current context.
The schema of the created instance of the DataFrame
is derived from the column name and the type of H2OFrame
specified.
Example¶
Using an explicit parameter in the call to pass sqlContext:
val sqlContext = new SQLContext(sc)
val schemaRDD = asDataFrame(h2oFrame)
or as an implicit variable provided by the actual environment:
implicit val sqlContext = new SQLContext(sc)
val schemaRDD = asDataFrame(h2oFrame)
Converting an RDD[T] into an H2OFrame¶
The H2OContext
provides implicit conversion from the specified RDD[A]
to H2OFrame
. As with conversion in the opposite direction, the type A
has to satisfy the upper bound expressed by the type Product
. The conversion will create a new H2OFrame
, transfer data from the specified RDD, and save it to the H2O K/V data store.
implicit def asH2OFrame[A <: Product : TypeTag](rdd : RDD[A]) : H2OFrame
The API also provides a explicit version, which allows for specifying the name for the resulting H2OFrame.
def asH2OFrame[A <: Product : TypeTag](rdd : RDD[A], frameName: Option[String]) : H2OFrame
Example¶
val rdd: RDD[Weather] = ...
import h2oContext.implicits._
// implicit call of H2OContext.asH2OFrame[Weather](rdd) is used
val hf: H2OFrame = rdd
// Explicit call of of H2OContext API with name for resulting H2O frame
val hfNamed: H2OFrame = h2oContext.asH2OFrame(rdd, Some("h2oframe"))
Converting a DataFrame into an H2OFrame¶
The H2OContext
provides implicit conversion from the specified DataFrame
to H2OFrame
. The conversion will create a new H2OFrame
, transfer data from the specified DataFrame
, and save it to the H2O K/V data store.
implicit def asH2OFrame(rdd : DataFrame) : H2OFrame
The API also provides an explicit version, which allows for specifying the name for the resulting H2OFrame.
def asH2OFrame(rdd : DataFrame, frameName: Option[String]) : H2OFrame
Example¶
val df: DataFrame = ...
import h2oContext.implicits._
// Implicit call of H2OContext.asH2OFrame(srdd) is used
val hf: H2OFrame = df
// Explicit call of H2Context API with name for resulting H2O frame
val hfNamed: H2OFrame = h2oContext.asH2OFrame(df, Some("h2oframe"))