Importing H2O Mojo
------------------

H2O Mojo can be imported to Sparkling Water from all data sources supported by Apache Spark such as local file, S3 or HDFS and the
semantics of the import is the same as in the Spark API.


If HDFS is not available for Spark, then call, in Scala:

.. code:: scala

    import ai.h2o.sparkling.ml.models._
    val model = H2OMOJOModel.createFromMojo("prostate.mojo")

or in Python:

.. code:: python

    from pysparkling.ml import *
    model = H2OMOJOModel.createFromMojo("prostate.mojo")

attempts to load the mojo file with the specified name from the current working directory.
You can also specify the full path such as, in Scala:

.. code:: scala

    import ai.h2o.sparkling.ml.models._
    val model = H2OMOJOModel.createFromMojo("/Users/peter/prostate.mojo")

or in Python:

.. code:: python

    from pysparkling.ml import *
    model = H2OMOJOModel.createFromMojo("/Users/peter/prostate.mojo")


In the case Spark is running on Hadoop and HDFS is available, then call, in Scala:

.. code:: scala

    import ai.h2o.sparkling.ml.models._
    val model = H2OMOJOModel.createFromMojo("prostate.mojo")

or in Python:

.. code:: python

    from pysparkling.ml import *
    model = H2OMOJOModel.createFromMojo("prostate.mojo")


attempts to load the mojo from the HDFS home directory of the current user.
You can also specify the absolute path in this case as, in Scala:

.. code:: scala

    import ai.h2o.sparkling.ml.models._
    val model = H2OMOJOModel.createFromMojo("/user/peter/prostate.mojo")

or in Python:

.. code:: python

    from pysparkling.ml import *
    model = H2OMOJOModel.createFromMojo("/user/peter/prostate.mojo")


Both calls load the mojo file from the following location ``hdfs://{server}:{port}/user/peter/prostate.mojo``, where ``{server}`` and ``{port}`` is automatically filled in by Spark.


You can also manually specify the type of data source you need to use, in that case, you need to provide the schema, in Scala:

.. code:: scala

    import ai.h2o.sparkling.ml.models._
    // HDFS
    val modelHDFS = H2OMOJOModel.createFromMojo("hdfs:///user/peter/prostate.mojo")
    // Local file
    val modelLocal = H2OMOJOModel.createFromMojo("file:///Users/peter/prostate.mojo")

or in Python:

.. code:: python

    from pysparkling.ml import *
    # HDFS
    modelHDFS = H2OMOJOModel.createFromMojo("hdfs:///user/peter/prostate.mojo")
    # Local file
    modelLocal = H2OMOJOModel.createFromMojo("file:///Users/peter/prostate.mojo")


The loaded model is an immutable instance, so it's not possible to change the configuration of the model during its existence.
On the other hand, the model can be configured during its creation via ``H2OMOJOSettings``, in Scala:

.. code:: scala

    import ai.h2o.sparkling.ml.models._
    val settings = H2OMOJOSettings(convertUnknownCategoricalLevelsToNa = true, convertInvalidNumbersToNa = true)
    val model = H2OMOJOModel.createFromMojo("prostate.mojo", settings)

or in Python:

.. code:: python

    from pysparkling.ml import *
    settings = H2OMOJOSettings(convertUnknownCategoricalLevelsToNa = True, convertInvalidNumbersToNa = True)
    model = H2OMOJOModel.createFromMojo("prostate.mojo", settings)


In Scala, the ``createFromMojo`` method returns a mojo model instance casted as a base class ``H2OMOJOModel``. This class holds
only properties that are shared accross all MOJO model types from the following type hierarchy:

- ``H2OMOJOModel``
    - ``H2OUnsupervisedMOJOModel``
    - ``H2OSupervisedMOJOModel``
        - ``H2OTreeBasedSupervisedMOJOModel``


If a Scala user wants to get a property specific for a given MOJO model type, he/she must utilize casting or
call the ``createFromMojo`` method on the specific MOJO model type.

.. code:: scala

    import ai.h2o.sparkling.ml.models._
    val specificModel = H2OTreeBasedSupervisedMOJOModel.createFromMojo("prostate.mojo")
    println(s"Ntrees: ${specificModel.getNTrees()}") // Relevant only to GBM, DRF and XGBoost