Spark - H2O Frame Mapping
-------------------------

Type Mapping between H2O H2OFrame Types and Spark DataFrame Types
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For all primitive Scala types or Spark SQL (see ``org.apache.spark.sql.types``) types that can be part of Spark RDD/DataFrame, we provide the mapping into H2O vector types (numeric, categorical, string, time, UUID - see ``water.fvec.Vec``):

+----------------------+-----------------+-------------------------+
| Scala type           | SQL type        | H2O type                |
+======================+=================+=========================+
| *NA*                 | BinaryType      | Numeric                 |
+----------------------+-----------------+-------------------------+
| Byte                 | ByteType        | Numeric                 |
+----------------------+-----------------+-------------------------+
| Short                | ShortType       | Numeric                 |
+----------------------+-----------------+-------------------------+
| Integer              | IntegerType     | Numeric                 |
+----------------------+-----------------+-------------------------+
| Long                 | LongType        | Numeric                 |
+----------------------+-----------------+-------------------------+
| Float                | FloatType       | Numeric                 |
+----------------------+-----------------+-------------------------+
| Double               | DoubleType      | Numeric                 |
+----------------------+-----------------+-------------------------+
| String               | StringType      | String/Categorical [1]_ |
+----------------------+-----------------+-------------------------+
| Boolean              | BooleanType     | Categorical [2]_        |
+----------------------+-----------------+-------------------------+
| java.sql.Timestamp   | TimestampType   | Time                    |
+----------------------+-----------------+-------------------------+

--------------

Type Mapping Between H2O H2OFrame Types and RDD[T] Types
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


As type ``T``, we support the following types:

+--------------------------------------------------+
| T                                                |
+==================================================+
| *NA*                                             |
+--------------------------------------------------+
| Byte                                             |
+--------------------------------------------------+
| Short                                            |
+--------------------------------------------------+
| Integer                                          |
+--------------------------------------------------+
| Long                                             |
+--------------------------------------------------+
| Float                                            |
+--------------------------------------------------+
| Double                                           |
+--------------------------------------------------+
| String                                           |
+--------------------------------------------------+
| Boolean                                          |
+--------------------------------------------------+
| java.sql.Timestamp                               |
+--------------------------------------------------+
| Any scala class extending scala ``Product``      |
+--------------------------------------------------+
| org.apache.spark.mllib.regression.LabeledPoint   |
+--------------------------------------------------+

As is specified in the table, Sparkling Water provides support for transforming arbitrary scala class extending ``Product``, which are, for example, all case classes.

.. rubric:: Footnotes
.. [1] The H2O type is String if cardinality is greater than 10 000 0000 or ratio of unique values to all values is 95% or higher.
.. [2] The H2O categorical values are "True" and "False" for true and false respectively.