Spark - H2O Frame Mapping

Type Mapping between H2O H2OFrame Types and Spark DataFrame Types

For all primitive Scala types or Spark SQL (see org.apache.spark.sql.types) types that can be part of Spark RDD/DataFrame, we provide mapping into H2O vector types (numeric, categorical, string, time, UUID - see water.fvec.Vec):

Scala type

SQL type

H2O type

NA

BinaryType

Numeric

Byte

ByteType

Numeric

Short

ShortType

Numeric

Integer

IntegerType

Numeric

Long

LongType

Numeric

Float

FloatType

Numeric

Double

DoubleType

Numeric

String

StringType

String

Boolean

BooleanType

Numeric

java.sql.Timestamp

TimestampType

Time


Type Mapping Between H2O H2OFrame Types and RDD[T] Types

As type T, we support following types:

T

NA

Byte

Short

Integer

Long

Float

Double

String

Boolean

java.sql.Timestamp

Any scala class extending scala Product

org.apache.spark.mllib.regression.LabeledPoint

As is specified in the table, Sparkling Water provides support for transforming arbitrary scala class extending Product, which are, for example, all case classes.