Spark - H2O Frame Mapping¶
Type Mapping between H2O H2OFrame Types and Spark DataFrame Types¶
For all primitive Scala types or Spark SQL (see org.apache.spark.sql.types) types that can be part of Spark RDD/DataFrame, we provide the mapping into H2O vector types (numeric, categorical, string, time, UUID - see water.fvec.Vec):
Scala type |
SQL type |
H2O type |
|---|---|---|
NA |
BinaryType |
Numeric |
Byte |
ByteType |
Numeric |
Short |
ShortType |
Numeric |
Integer |
IntegerType |
Numeric |
Long |
LongType |
Numeric |
Float |
FloatType |
Numeric |
Double |
DoubleType |
Numeric |
String |
StringType |
String/Categorical 1 |
Boolean |
BooleanType |
Categorical 2 |
java.sql.Timestamp |
TimestampType |
Time |
Type Mapping Between H2O H2OFrame Types and RDD[T] Types¶
As type T, we support the following types:
T |
|---|
NA |
Byte |
Short |
Integer |
Long |
Float |
Double |
String |
Boolean |
java.sql.Timestamp |
Any scala class extending scala |
org.apache.spark.mllib.regression.LabeledPoint |
As is specified in the table, Sparkling Water provides support for transforming arbitrary scala class extending Product, which are, for example, all case classes.
Footnotes