H2OAssembly

class h2o.assembly.H2OAssembly(steps)[source]

Bases: object

H2OAssembly class can be used to specify multiple frame operations in one place.

Returns:a new H2OFrame

Sample usage:

>>> iris = h2o.load_dataset("iris")
>>> from h2o.assembly import *
>>> from h2o.transforms.preprocessing import *
>>> assembly = H2OAssembly(steps=[("col_select",
...                                H2OColSelect(["Sepal.Length", "Petal.Length", "Species"])),
...                               ("cos_Sepal.Length",
...                                H2OColOp(op=H2OFrame.cos, col="Sepal.Length", inplace=True)),
...                               ("str_cnt_Species",
...                                H2OColOp(op=H2OFrame.countmatches,
...                                col="Species",
...                                inplace=False, pattern="s"))])
>>> result = assembly.fit(iris)  # fit the assembly and perform the munging operations
>>> result
 Sepal.Length    Petal.Length     Species     Species0
--------------  --------------  -----------  ----------
   0.377978           1.4       Iris-setosa       3
   0.186512           1.4       Iris-setosa       3
  -0.0123887          1.3       Iris-setosa       3
  -0.112153           1.5       Iris-setosa       3
   0.283662           1.4       Iris-setosa       3
   0.634693           1.7       Iris-setosa       3
  -0.112153           1.4       Iris-setosa       3
   0.283662           1.5       Iris-setosa       3
  -0.307333           1.4       Iris-setosa       3
   0.186512           1.5       Iris-setosa       3
[150 rows x 4 columns]

In this example, we first load the iris frame. Next, the following data munging operations are performed on the iris frame

  1. only select three columns out of the five columns;
  2. take the cosine of the column Sepal.Length and replace the original column with the cosine of the column;
  3. want to count the number of rows with the letter s in the class column. Since inplace is set to False, a new column is generated to hold the result.

Extension class of Pipeline implementing additional methods:

  • to_pojo: Exports the assembly to a self-contained Java POJO used in a per-row, high-throughput environment.

In addition, H2OAssembly provides a few static methods that perform element to element operations between two frames. They all are called as

>>> H2OAssembly.op(frame1, frame2)

while frame1, frame2 are H2OFrame of the same size and same column types. It will return a H2OFrame containing the element-wise result of operation op. The following operations are currently supported

  • divide
  • plus
  • multiply
  • minus
  • less_than
  • less_than_equal
  • equal_equal
  • not_equal
  • greater_than
  • greater_than_equal
divide(rhs)

Divides one frame from the other.

Returns:the quotient of the frames.
Examples:
>>> python_list1 = [[4,4,4,4],[4,4,4,4]]
>>> python_list2 = [[2,2,2,2], [2,2,2,2]]
>>> frame1 = h2o.H2OFrame(python_obj=python_list1)
>>> frame2 = h2o.H2OFrame(python_obj=python_list2)
>>> H2OAssembly.divide(frame1, frame2)
   C1    C2    C3    C4
  ----  ----  ----  ----
   2     2     2     2
   2     2     2     2
equal_equal(rhs)

Measures whether the frames are equal.

Returns:boolean true/false response (0/1 = no/yes).
Examples:
>>> python_list1 = [[4,4,4,4],[4,4,4,4]]
>>> python_list2 = [[2,2,2,2], [2,2,2,2]]
>>> frame1 = h2o.H2OFrame(python_obj=python_list1)
>>> frame2 = h2o.H2OFrame(python_obj=python_list2)
>>> H2OAssembly.equal_equal(frame1, frame2)
    C1    C2    C3    C4
   ----  ----  ----  ----
     0     0     0     0
     0     0     0     0
fit(fr)[source]

To perform the munging operations on a frame specified in steps on the frame fr.

Parameters:fr – H2OFrame where munging operations are to be performed on.
Returns:H2OFrame after munging operations are completed.
Examples:
>>> iris = h2o.load_dataset("iris")
>>> assembly = H2OAssembly(steps=[("col_select",
...                        H2OColSelect(["Sepal.Length",
...                        "Petal.Length", "Species"])),
...                       ("cos_Sepal.Length",
...                        H2OColOp(op=H2OFrame.cos,
...                        col="Sepal.Length",
...                        inplace=True)),
...                       ("str_cnt_Species",
...                        H2OColOp(op=H2OFrame.countmatches,
...                        col="Species",
...                        inplace=False,
...                        pattern="s"))])
>>> fit = assembly.fit(iris)
>>> fit
greater_than(rhs)

Measures whether one frame is greater than the other.

Returns:boolean true/false response (0/1 = no/yes).
Examples:
>>> python_list1 = [[4,4,4,4],[4,4,4,4]]
>>> python_list2 = [[2,2,2,2], [2,2,2,2]]
>>> frame1 = h2o.H2OFrame(python_obj=python_list1)
>>> frame2 = h2o.H2OFrame(python_obj=python_list2)
>>> H2OAssembly.greater_than(frame1, frame2)
    C1    C2    C3    C4
   ----  ----  ----  ----
     1     1     1     1
     1     1     1     1
greater_than_equal(rhs)

Measures whether one frame is greater than or equal to the other.

Returns:boolean true/false response (0/1 = no/yes).
Examples:
>>> python_list1 = [[4,4,4,4],[4,4,4,4]]
>>> python_list2 = [[2,2,2,2], [2,2,2,2]]
>>> frame1 = h2o.H2OFrame(python_obj=python_list1)
>>> frame2 = h2o.H2OFrame(python_obj=python_list2)
>>> H2OAssembly.greater_than_equal(frame1, frame2)
     C1    C2    C3    C4
    ----  ----  ----  ----
     1     1     1     1
     1     1     1     1
less_than(rhs)

Measures whether one frame is less than the other.

Returns:boolean true/false response (0/1 = no/yes).
Examples:
>>> python_list1 = [[4,4,4,4],[4,4,4,4]]
>>> python_list2 = [[2,2,2,2], [2,2,2,2]]
>>> frame1 = h2o.H2OFrame(python_obj=python_list1)
>>> frame2 = h2o.H2OFrame(python_obj=python_list2)
>>> H2OAssembly.less_than(frame1, frame2)
     C1    C2    C3    C4
    ----  ----  ----  ----
      0     0     0     0
      0     0     0     0
less_than_equal(rhs)

Measures whether one frame is less than or equal to the other.

Returns:boolean true/false response (0/1 = no/yes).
Examples:
>>> python_list1 = [[4,4,4,4],[4,4,4,4]]
>>> python_list2 = [[2,2,2,2], [2,2,2,2]]
>>> frame1 = h2o.H2OFrame(python_obj=python_list1)
>>> frame2 = h2o.H2OFrame(python_obj=python_list2)
>>> H2OAssembly.less_than_equal(frame1, frame2)
     C1    C2    C3    C4
    ----  ----  ----  ----
     0     0     0     0
     0     0     0     0
minus(rhs)

Subtracts one frame from the other.

Examples:the difference of the frames.
>>> python_list1 = [[4,4,4,4],[4,4,4,4]]
>>> python_list2 = [[2,2,2,2], [2,2,2,2]]
>>> frame1 = h2o.H2OFrame(python_obj=python_list1)
>>> frame2 = h2o.H2OFrame(python_obj=python_list2)
>>> H2OAssembly.minus(frame1, frame2)
    C1    C2    C3    C4
   ----  ----  ----  ----
    2     2     2     2
    2     2     2     2
multiply(rhs)

Multiplies the frames together.

Returns:the product of the frames.
Examples:
>>> python_list1 = [[4,4,4,4],[4,4,4,4]]
>>> python_list2 = [[2,2,2,2], [2,2,2,2]]
>>> frame1 = h2o.H2OFrame(python_obj=python_list1)
>>> frame2 = h2o.H2OFrame(python_obj=python_list2)
>>> H2OAssembly.multiply(frame1, frame2)
    C1    C2    C3    C4
   ----  ----  ----  ----
    8     8     8     8
    8     8     8     8
names

Gives the column names.

Returns:the specified column names.
Examples:
>>> iris = h2o.load_dataset("iris")
>>> from h2o.assembly import *
>>> from h2o.transforms.preprocessing import *
>>> assembly = H2OAssembly(steps=[("col_select",
...                                H2OColSelect(["Sepal.Length", "Petal.Length", "Species"])),
...                               ("cos_Sepal.Length",
...                                H2OColOp(op=H2OFrame.cos, col="Sepal.Length", inplace=True)),
...                               ("str_cnt_Species",
...                                H2OColOp(op=H2OFrame.countmatches,
...                                col="Species",
...                                inplace=False, pattern="s"))])
>>> result = assembly.fit(iris)
>>> result.names
[u'Sepal.Length', u'Petal.Length', u'Species', u'Species0']
not_equal(rhs)

Measures whether the frames are not equal.

Returns:boolean true/false response (0/1 = no/yes).
Examples:
>>> python_list1 = [[4,4,4,4],[4,4,4,4]]
>>> python_list2 = [[2,2,2,2], [2,2,2,2]]
>>> frame1 = h2o.H2OFrame(python_obj=python_list1)
>>> frame2 = h2o.H2OFrame(python_obj=python_list2)
>>> H2OAssembly.not_equal(frame1, frame2)
    C1    C2    C3    C4
   ----  ----  ----  ----
     1     1     1     1
     1     1     1     1
plus(rhs)

Adds the frames together.

Returns:the sum of the frames.
Examples:
>>> python_list1 = [[4,4,4,4],[4,4,4,4]]
>>> python_list2 = [[2,2,2,2], [2,2,2,2]]
>>> frame1 = h2o.H2OFrame(python_obj=python_list1)
>>> frame2 = h2o.H2OFrame(python_obj=python_list2)
>>> H2OAssembly.plus(frame1, frame2)
     C1    C2    C3    C4
    ----  ----  ----  ----
     6     6     6     6
     6     6     6     6
to_pojo(pojo_name='', path='', get_jar=True)[source]

Convert the munging operations performed on H2OFrame into a POJO.

Parameters:
  • pojo_name – (str) Name of POJO
  • path – (str) path of POJO.
  • get_jar – (bool) Whether to also download the h2o-genmodel.jar file needed to compile the POJO
Returns:

None

Examples:
>>> from h2o.assembly import *
>>> from h2o.transforms.preprocessing import *
>>> iris = h2o.load_dataset("iris")
>>> assembly = H2OAssembly(steps=[("col_select",
...                                H2OColSelect(["Sepal.Length",
...                                "Petal.Length", "Species"])),
...                               ("cos_Sepal.Length",
...                                H2OColOp(op=H2OFrame.cos,
...                                col="Sepal.Length", inplace=True)),
...                               ("str_cnt_Species",
...                                H2OColOp(op=H2OFrame.countmatches,
...                                col="Species", inplace=False,
...                                pattern="s"))])
>>> result = assembly.fit(iris)
>>> assembly.to_pojo(pojo_name="iris_pojo", path='', get_jar=False)