H2OAssembly¶
-
class
h2o.assembly.
H2OAssembly
(steps)[source]¶ Bases:
object
H2OAssembly class can be used to specify multiple frame operations in one place.
Returns: a new H2OFrame Sample usage:
>>> iris = h2o.load_dataset("iris") >>> from h2o.assembly import * >>> from h2o.transforms.preprocessing import * >>> assembly = H2OAssembly(steps=[("col_select", ... H2OColSelect(["Sepal.Length", "Petal.Length", "Species"])), ... ("cos_Sepal.Length", ... H2OColOp(op=H2OFrame.cos, col="Sepal.Length", inplace=True)), ... ("str_cnt_Species", ... H2OColOp(op=H2OFrame.countmatches, ... col="Species", ... inplace=False, pattern="s"))]) >>> result = assembly.fit(iris) # fit the assembly and perform the munging operations >>> result Sepal.Length Petal.Length Species Species0 -------------- -------------- ----------- ---------- 0.377978 1.4 Iris-setosa 3 0.186512 1.4 Iris-setosa 3 -0.0123887 1.3 Iris-setosa 3 -0.112153 1.5 Iris-setosa 3 0.283662 1.4 Iris-setosa 3 0.634693 1.7 Iris-setosa 3 -0.112153 1.4 Iris-setosa 3 0.283662 1.5 Iris-setosa 3 -0.307333 1.4 Iris-setosa 3 0.186512 1.5 Iris-setosa 3 [150 rows x 4 columns]
In this example, we first load the iris frame. Next, the following data munging operations are performed on the iris frame
- only select three columns out of the five columns;
- take the cosine of the column Sepal.Length and replace the original column with the cosine of the column;
- want to count the number of rows with the letter s in the class column. Since inplace is set to False, a new column is generated to hold the result.
Extension class of Pipeline implementing additional methods:
- to_pojo: Exports the assembly to a self-contained Java POJO used in a per-row, high-throughput environment.
In addition, H2OAssembly provides a few static methods that perform element to element operations between two frames. They all are called as
>>> H2OAssembly.op(frame1, frame2)
while frame1, frame2 are H2OFrame of the same size and same column types. It will return a H2OFrame containing the element-wise result of operation op. The following operations are currently supported
- divide
- plus
- multiply
- minus
- less_than
- less_than_equal
- equal_equal
- not_equal
- greater_than
- greater_than_equal
-
divide
(rhs)¶ Divides one frame from the other.
Returns: the quotient of the frames. Examples: >>> python_list1 = [[4,4,4,4],[4,4,4,4]] >>> python_list2 = [[2,2,2,2], [2,2,2,2]] >>> frame1 = h2o.H2OFrame(python_obj=python_list1) >>> frame2 = h2o.H2OFrame(python_obj=python_list2) >>> H2OAssembly.divide(frame1, frame2) C1 C2 C3 C4 ---- ---- ---- ---- 2 2 2 2 2 2 2 2
-
equal_equal
(rhs)¶ Measures whether the frames are equal.
Returns: boolean true/false response (0/1 = no/yes). Examples: >>> python_list1 = [[4,4,4,4],[4,4,4,4]] >>> python_list2 = [[2,2,2,2], [2,2,2,2]] >>> frame1 = h2o.H2OFrame(python_obj=python_list1) >>> frame2 = h2o.H2OFrame(python_obj=python_list2) >>> H2OAssembly.equal_equal(frame1, frame2) C1 C2 C3 C4 ---- ---- ---- ---- 0 0 0 0 0 0 0 0
-
fit
(fr)[source]¶ To perform the munging operations on a frame specified in steps on the frame fr.
Parameters: fr – H2OFrame where munging operations are to be performed on. Returns: H2OFrame after munging operations are completed. Examples: >>> iris = h2o.load_dataset("iris") >>> assembly = H2OAssembly(steps=[("col_select", ... H2OColSelect(["Sepal.Length", ... "Petal.Length", "Species"])), ... ("cos_Sepal.Length", ... H2OColOp(op=H2OFrame.cos, ... col="Sepal.Length", ... inplace=True)), ... ("str_cnt_Species", ... H2OColOp(op=H2OFrame.countmatches, ... col="Species", ... inplace=False, ... pattern="s"))]) >>> fit = assembly.fit(iris) >>> fit
-
greater_than
(rhs)¶ Measures whether one frame is greater than the other.
Returns: boolean true/false response (0/1 = no/yes). Examples: >>> python_list1 = [[4,4,4,4],[4,4,4,4]] >>> python_list2 = [[2,2,2,2], [2,2,2,2]] >>> frame1 = h2o.H2OFrame(python_obj=python_list1) >>> frame2 = h2o.H2OFrame(python_obj=python_list2) >>> H2OAssembly.greater_than(frame1, frame2) C1 C2 C3 C4 ---- ---- ---- ---- 1 1 1 1 1 1 1 1
-
greater_than_equal
(rhs)¶ Measures whether one frame is greater than or equal to the other.
Returns: boolean true/false response (0/1 = no/yes). Examples: >>> python_list1 = [[4,4,4,4],[4,4,4,4]] >>> python_list2 = [[2,2,2,2], [2,2,2,2]] >>> frame1 = h2o.H2OFrame(python_obj=python_list1) >>> frame2 = h2o.H2OFrame(python_obj=python_list2) >>> H2OAssembly.greater_than_equal(frame1, frame2) C1 C2 C3 C4 ---- ---- ---- ---- 1 1 1 1 1 1 1 1
-
less_than
(rhs)¶ Measures whether one frame is less than the other.
Returns: boolean true/false response (0/1 = no/yes). Examples: >>> python_list1 = [[4,4,4,4],[4,4,4,4]] >>> python_list2 = [[2,2,2,2], [2,2,2,2]] >>> frame1 = h2o.H2OFrame(python_obj=python_list1) >>> frame2 = h2o.H2OFrame(python_obj=python_list2) >>> H2OAssembly.less_than(frame1, frame2) C1 C2 C3 C4 ---- ---- ---- ---- 0 0 0 0 0 0 0 0
-
less_than_equal
(rhs)¶ Measures whether one frame is less than or equal to the other.
Returns: boolean true/false response (0/1 = no/yes). Examples: >>> python_list1 = [[4,4,4,4],[4,4,4,4]] >>> python_list2 = [[2,2,2,2], [2,2,2,2]] >>> frame1 = h2o.H2OFrame(python_obj=python_list1) >>> frame2 = h2o.H2OFrame(python_obj=python_list2) >>> H2OAssembly.less_than_equal(frame1, frame2) C1 C2 C3 C4 ---- ---- ---- ---- 0 0 0 0 0 0 0 0
-
minus
(rhs)¶ Subtracts one frame from the other.
Examples: the difference of the frames. >>> python_list1 = [[4,4,4,4],[4,4,4,4]] >>> python_list2 = [[2,2,2,2], [2,2,2,2]] >>> frame1 = h2o.H2OFrame(python_obj=python_list1) >>> frame2 = h2o.H2OFrame(python_obj=python_list2) >>> H2OAssembly.minus(frame1, frame2) C1 C2 C3 C4 ---- ---- ---- ---- 2 2 2 2 2 2 2 2
-
multiply
(rhs)¶ Multiplies the frames together.
Returns: the product of the frames. Examples: >>> python_list1 = [[4,4,4,4],[4,4,4,4]] >>> python_list2 = [[2,2,2,2], [2,2,2,2]] >>> frame1 = h2o.H2OFrame(python_obj=python_list1) >>> frame2 = h2o.H2OFrame(python_obj=python_list2) >>> H2OAssembly.multiply(frame1, frame2) C1 C2 C3 C4 ---- ---- ---- ---- 8 8 8 8 8 8 8 8
-
names
¶ Gives the column names.
Returns: the specified column names. Examples: >>> iris = h2o.load_dataset("iris") >>> from h2o.assembly import * >>> from h2o.transforms.preprocessing import * >>> assembly = H2OAssembly(steps=[("col_select", ... H2OColSelect(["Sepal.Length", "Petal.Length", "Species"])), ... ("cos_Sepal.Length", ... H2OColOp(op=H2OFrame.cos, col="Sepal.Length", inplace=True)), ... ("str_cnt_Species", ... H2OColOp(op=H2OFrame.countmatches, ... col="Species", ... inplace=False, pattern="s"))]) >>> result = assembly.fit(iris) >>> result.names [u'Sepal.Length', u'Petal.Length', u'Species', u'Species0']
-
not_equal
(rhs)¶ Measures whether the frames are not equal.
Returns: boolean true/false response (0/1 = no/yes). Examples: >>> python_list1 = [[4,4,4,4],[4,4,4,4]] >>> python_list2 = [[2,2,2,2], [2,2,2,2]] >>> frame1 = h2o.H2OFrame(python_obj=python_list1) >>> frame2 = h2o.H2OFrame(python_obj=python_list2) >>> H2OAssembly.not_equal(frame1, frame2) C1 C2 C3 C4 ---- ---- ---- ---- 1 1 1 1 1 1 1 1
-
plus
(rhs)¶ Adds the frames together.
Returns: the sum of the frames. Examples: >>> python_list1 = [[4,4,4,4],[4,4,4,4]] >>> python_list2 = [[2,2,2,2], [2,2,2,2]] >>> frame1 = h2o.H2OFrame(python_obj=python_list1) >>> frame2 = h2o.H2OFrame(python_obj=python_list2) >>> H2OAssembly.plus(frame1, frame2) C1 C2 C3 C4 ---- ---- ---- ---- 6 6 6 6 6 6 6 6
-
to_pojo
(pojo_name='', path='', get_jar=True)[source]¶ Convert the munging operations performed on H2OFrame into a POJO.
Parameters: - pojo_name – (str) Name of POJO
- path – (str) path of POJO.
- get_jar – (bool) Whether to also download the h2o-genmodel.jar file needed to compile the POJO
Returns: None
Examples: >>> from h2o.assembly import * >>> from h2o.transforms.preprocessing import * >>> iris = h2o.load_dataset("iris") >>> assembly = H2OAssembly(steps=[("col_select", ... H2OColSelect(["Sepal.Length", ... "Petal.Length", "Species"])), ... ("cos_Sepal.Length", ... H2OColOp(op=H2OFrame.cos, ... col="Sepal.Length", inplace=True)), ... ("str_cnt_Species", ... H2OColOp(op=H2OFrame.countmatches, ... col="Species", inplace=False, ... pattern="s"))]) >>> result = assembly.fit(iris) >>> assembly.to_pojo(pojo_name="iris_pojo", path='', get_jar=False)