Spark - H2O Frame Mapping

Type Mapping between H2O H2OFrame Types and Spark DataFrame Types

For all primitive Scala types or Spark SQL (see org.apache.spark.sql.types) types that can be part of Spark RDD/DataFrame, we provide mapping into H2O vector types (numeric, categorical, string, time, UUID - see water.fvec.Vec):

Scala type SQL type H2O type
NA BinaryType Numeric
Byte ByteType Numeric
Short ShortType Numeric
Integer IntegerType Numeric
Long LongType Numeric
Float FloatType Numeric
Double DoubleType Numeric
String StringType String
Boolean BooleanType Numeric
java.sql.Timestamp TimestampType Time

Type Mapping Between H2O H2OFrame Types and RDD[T] Types

As type T, we support following types:

T
NA
Byte
Short
Integer
Long
Float
Double
String
Boolean
java.sql.Timestamp
Any scala class extending scala Product
org.apache.spark.mllib.regression.LabeledPoint

As is specified in the table, Sparkling Water provides support for transforming arbitrary scala class extending Product, which are, for example, all case classes.