Spark Frame <–> H2O Frame Conversions

Quick links:

Converting an H2OFrame into an RDD[T]

Scala

The H2OContext class provides the method asRDD, which creates an RDD-like wrapper around the provided H2O’s H2OFrame:

def asRDD[A <: Product: TypeTag: ClassTag](fr: H2OFrame): RDD[A]

The call expects the type A to create a correctly typed RDD. The conversion requires type A to be bound by the Product interface. The relationship between the columns of the H2OFrame and the attributes of class A is based on name matching.

Example

case class Person(name: String, age: Int)
val rdd = asRDD[Person](h2oFrame)

Converting an H2OFrame into a DataFrame

Scala

The H2OContext class provides the method asSparkFrame, which creates a DataFrame-like wrapper around the provided H2OFrame:

def asSparkFrame(fr: H2OFrame): DataFrame

The schema of the created instance of the DataFrame is derived from the column names and types of the specified H2OFrame.

Example

val dataFrame = h2oContext.asSparkFrame(h2oFrame)

Python

The H2OContext class provides the method asSparkFrame, which creates a DataFrame-like wrapper around the provided H2OFrame:

def asSparkFrame(self, h2oFrame)

The schema of the created instance of the DataFrame is derived from the column names and types of the specified H2OFrame.

Example

dataFrame = h2oContext.asSparkFrame(h2oFrame)

R

The H2OContext class provides the method asSparkFrame, which creates a DataFrame-like wrapper around the provided H2OFrame:

asSparkFrame = function(h2oFrame)

The schema of the created instance of the DataFrame is derived from the column names and types of the specified H2OFrame.

Example

dataFrame <- h2oContext$asSparkFrame(h2oFrame)

Converting an RDD[T] into an H2OFrame

Scala

The H2OContext provides a conversion method from the specified RDD[A] to H2OFrame. As with conversion in the opposite direction, the type A has to satisfy the upper bound expressed by the type Product. The conversion creates a new H2OFrame, transfers data from the specified RDD, and saves it to the DKV store on the H2O backend.

def asH2OFrame[A <: Product : TypeTag](rdd : RDD[A]): H2OFrame

The API also provides a version, which allows for specifying the name for the resulting H2OFrame.

def asH2OFrame[A <: Product : TypeTag](rdd : RDD[A], frameName: String): H2OFrame

Example

val h2oFrame = h2oContext.asH2OFrame(rdd)

Python

The H2OContext provides a conversion method from the specified PySpark RDD to H2OFrame. The conversion creates a new H2OFrame, transfers data from the specified RDD, and saves it to the DKV store on the H2O backend.

def asH2OFrame(self, rdd, h2oFrameName=None, fullCols=-1)

Parameters

  • rdd : PySpark RDD

  • h2oFrameName : Optional name for resulting H2OFrame

  • fullCols : A number of first n columns which are considered for conversion. -1 represents ‘no limit’.

Example

h2oFrame = h2oContext.asH2OFrame(df)

Converting a DataFrame into an H2OFrame

Scala

The H2OContext provides conversion method from the specified DataFrame to H2OFrame. The conversion creates a new H2OFrame, transfers data from the specified DataFrame, and saves it to the DKV store on the H2O backend.

def asH2OFrame(df: DataFrame): H2OFrame

The API also provides a version, which allows for specifying the name for the resulting H2OFrame.

def asH2OFrame(rdd : DataFrame, frameName: String): H2OFrame

Example

val h2oFrame = h2oContext.asH2OFrame(df)

Python

The H2OContext provides conversion method from the specified DataFrame to H2OFrame. The conversion creates a new H2OFrame, transfers data from the specified DataFrame, and saves it to the DKV store on the H2O backend.

def asH2OFrame(self, sparkFrame, h2oFrameName=None, fullCols=-1)

Parameters

  • sparkFrame : PySpark data frame

  • h2oFrameName : Optional name for resulting H2OFrame

  • fullCols : A number of first n columns which are considered for conversion. -1 represents ‘no limit’.

Example

h2oFrame = h2oContext.asH2OFrame(df)

R

The H2OContext provides conversion method from the specified DataFrame to H2OFrame. The conversion creates a new H2OFrame, transfers data from the specified DataFrame, and saves it to the DKV store on the H2O backend.

asH2OFrame = function(sparkFrame, h2oFrameName = NULL)

Parameters

  • sparkFrame : Spark data frame

  • h2oFrameName : Optional name for resulting H2OFrame

Example

h2oFrame <- h2oContext$asH2OFrame(df)