H2OAssembly¶
- 
class h2o.assembly.H2OAssembly(steps)[source]¶
- Bases: - object- The H2OAssembly class can be used to specify multiple frame operations in one place. - Returns
- a new H2OFrame. 
- Example
 - >>> iris = h2o.load_dataset("iris") >>> from h2o.assembly import * >>> from h2o.transforms.preprocessing import * >>> assembly = H2OAssembly(steps=[("col_select", ... H2OColSelect(["Sepal.Length", "Petal.Length", "Species"])), ... ("cos_Sepal.Length", ... H2OColOp(op=H2OFrame.cos, col="Sepal.Length", inplace=True)), ... ("str_cnt_Species", ... H2OColOp(op=H2OFrame.countmatches, ... col="Species", ... inplace=False, pattern="s"))]) >>> result = assembly.fit(iris) # fit the assembly and perform the munging operations >>> result Sepal.Length Petal.Length Species Species0 -------------- -------------- ----------- ---------- 0.377978 1.4 Iris-setosa 3 0.186512 1.4 Iris-setosa 3 -0.0123887 1.3 Iris-setosa 3 -0.112153 1.5 Iris-setosa 3 0.283662 1.4 Iris-setosa 3 0.634693 1.7 Iris-setosa 3 -0.112153 1.4 Iris-setosa 3 0.283662 1.5 Iris-setosa 3 -0.307333 1.4 Iris-setosa 3 0.186512 1.5 Iris-setosa 3 [150 rows x 4 columns] - In this example, we first load the iris frame. Next, the following data munging operations are performed on the iris frame: - select only three out of the five columns; 
- take the cosine of the column “Sepal.Length” and replace the original column with the cosine of the column; 
- count the number of rows with the letter “s” in the class column. Since - inplace=False, a new column is generated to hold the result.
 - Extension class of Pipeline implementing additional methods: - to_pojo: Exports the assembly to a self-contained Java POJO used in a per-row, high-throughput environment.
 - Additionally, H2OAssembly provides a few static methods that perform element to element operations between two frames. They all are called as: - >>> H2OAssembly.op(frame1, frame2) - where - frame1, frame2are H2OFrames of the same size and same column types. It will return an H2OFrame containing the element-wise result of operation op. The following operations are currently supported:- divide 
- plus 
- multiply 
- minus 
- less_than 
- less_than_equal 
- equal_equal 
- not_equal 
- greater_than 
- greater_than_equal 
 - 
divide(rhs)¶
- Divides one frame from the other. - Returns
- the quotient of the frames. 
- Examples
 - >>> python_list1 = [[4,4,4,4],[4,4,4,4]] >>> python_list2 = [[2,2,2,2], [2,2,2,2]] >>> frame1 = h2o.H2OFrame(python_obj=python_list1) >>> frame2 = h2o.H2OFrame(python_obj=python_list2) >>> H2OAssembly.divide(frame1, frame2) C1 C2 C3 C4 ---- ---- ---- ---- 2 2 2 2 2 2 2 2 
 - 
download_mojo(file_name='', path='.')[source]¶
- Convert the munging operations performed on H2OFrame into a MOJO 2 artifact. This method requires an additional mojo2-runtime library on the Java classpath. The library can be found at this maven URL:: https://repo1.maven.org/maven2/ai/h2o/mojo2-runtime/2.7.11.1/mojo2-runtime-2.7.11.1.jar. - The library can be added to the classpath via Java command when starting an H2O node from the command line: - java -cp <path_to_h2o_jar>:<path_to_mojo2-runtime_library> water.H2OApp - The library can also be added to the Java classpath from Python while starting an H2O cluster via - h2o.init():- >>> import h2o >>> h2o.init(extra_classpath = ["<path_to_mojo2-runtime_library>"]) - The MOJO 2 artifact created by this method can be utilized according to the tutorials on the page https://docs.h2o.ai/driverless-ai/1-10-lts/docs/userguide/scoring-mojo-scoring-pipeline.html with one additional requirement. The artifact produced by this method requires h2o-genmodel.jar to be present on Java classpath. - Parameters
- file_name – (str) Name of MOJO 2 artifact. 
- path – (str) Local Path on a user side serving as target for MOJO 2 artifact. 
 
- Returns
- Streamed file. 
- Examples
 - >>> from h2o.assembly import * >>> from h2o.transforms.preprocessing import * >>> iris = h2o.load_dataset("iris") >>> assembly = H2OAssembly(steps=[("col_select", ... H2OColSelect(["Sepal.Length", ... "Petal.Length", "Species"])), ... ("cos_Sepal.Length", ... H2OColOp(op=H2OFrame.cos, ... col="Sepal.Length", inplace=True)), ... ("str_cnt_Species", ... H2OColOp(op=H2OFrame.countmatches, ... col="Species", inplace=False, ... pattern="s"))]) >>> result = assembly.fit(iris) >>> assembly.download_mojo(file_name="iris_mojo", path='') - Note - The output column names of the created MOJO 2 pipeline are prefixed with “assembly_” since the MOJO2 library requires unique names across all columns present in pipeline. 
 - 
equal_equal(rhs)¶
- Measures whether the frames are equal. - Returns
- boolean true/false response (0/1 = no/yes). 
- Examples
 - >>> python_list1 = [[4,4,4,4],[4,4,4,4]] >>> python_list2 = [[2,2,2,2], [2,2,2,2]] >>> frame1 = h2o.H2OFrame(python_obj=python_list1) >>> frame2 = h2o.H2OFrame(python_obj=python_list2) >>> H2OAssembly.equal_equal(frame1, frame2) C1 C2 C3 C4 ---- ---- ---- ---- 0 0 0 0 0 0 0 0 
 - 
fit(fr)[source]¶
- To perform the munging operations on a frame specified in steps on the frame - fr.- Parameters
- fr – H2OFrame where munging operations are to be performed on. 
- Returns
- H2OFrame after munging operations are completed. 
- Examples
 - >>> iris = h2o.load_dataset("iris") >>> assembly = H2OAssembly(steps=[("col_select", ... H2OColSelect(["Sepal.Length", ... "Petal.Length", "Species"])), ... ("cos_Sepal.Length", ... H2OColOp(op=H2OFrame.cos, ... col="Sepal.Length", ... inplace=True)), ... ("str_cnt_Species", ... H2OColOp(op=H2OFrame.countmatches, ... col="Species", ... inplace=False, ... pattern="s"))]) >>> fit = assembly.fit(iris) >>> fit 
 - 
greater_than(rhs)¶
- Measures whether one frame is greater than the other. - Returns
- boolean true/false response (0/1 = no/yes). 
- Examples
 - >>> python_list1 = [[4,4,4,4],[4,4,4,4]] >>> python_list2 = [[2,2,2,2], [2,2,2,2]] >>> frame1 = h2o.H2OFrame(python_obj=python_list1) >>> frame2 = h2o.H2OFrame(python_obj=python_list2) >>> H2OAssembly.greater_than(frame1, frame2) C1 C2 C3 C4 ---- ---- ---- ---- 1 1 1 1 1 1 1 1 
 - 
greater_than_equal(rhs)¶
- Measures whether one frame is greater than or equal to the other. - Returns
- boolean true/false response (0/1 = no/yes). 
- Examples
 - >>> python_list1 = [[4,4,4,4],[4,4,4,4]] >>> python_list2 = [[2,2,2,2], [2,2,2,2]] >>> frame1 = h2o.H2OFrame(python_obj=python_list1) >>> frame2 = h2o.H2OFrame(python_obj=python_list2) >>> H2OAssembly.greater_than_equal(frame1, frame2) C1 C2 C3 C4 ---- ---- ---- ---- 1 1 1 1 1 1 1 1 
 - 
less_than(rhs)¶
- Measures whether one frame is less than the other. - Returns
- boolean true/false response (0/1 = no/yes). 
- Examples
 - >>> python_list1 = [[4,4,4,4],[4,4,4,4]] >>> python_list2 = [[2,2,2,2], [2,2,2,2]] >>> frame1 = h2o.H2OFrame(python_obj=python_list1) >>> frame2 = h2o.H2OFrame(python_obj=python_list2) >>> H2OAssembly.less_than(frame1, frame2) C1 C2 C3 C4 ---- ---- ---- ---- 0 0 0 0 0 0 0 0 
 - 
less_than_equal(rhs)¶
- Measures whether one frame is less than or equal to the other. - Returns
- boolean true/false response (0/1 = no/yes). 
- Examples
 - >>> python_list1 = [[4,4,4,4],[4,4,4,4]] >>> python_list2 = [[2,2,2,2], [2,2,2,2]] >>> frame1 = h2o.H2OFrame(python_obj=python_list1) >>> frame2 = h2o.H2OFrame(python_obj=python_list2) >>> H2OAssembly.less_than_equal(frame1, frame2) C1 C2 C3 C4 ---- ---- ---- ---- 0 0 0 0 0 0 0 0 
 - 
minus(rhs)¶
- Subtracts one frame from the other. - Examples
- the difference of the frames. 
 - >>> python_list1 = [[4,4,4,4],[4,4,4,4]] >>> python_list2 = [[2,2,2,2], [2,2,2,2]] >>> frame1 = h2o.H2OFrame(python_obj=python_list1) >>> frame2 = h2o.H2OFrame(python_obj=python_list2) >>> H2OAssembly.minus(frame1, frame2) C1 C2 C3 C4 ---- ---- ---- ---- 2 2 2 2 2 2 2 2 
 - 
multiply(rhs)¶
- Multiplies the frames together. - Returns
- the product of the frames. 
- Examples
 - >>> python_list1 = [[4,4,4,4],[4,4,4,4]] >>> python_list2 = [[2,2,2,2], [2,2,2,2]] >>> frame1 = h2o.H2OFrame(python_obj=python_list1) >>> frame2 = h2o.H2OFrame(python_obj=python_list2) >>> H2OAssembly.multiply(frame1, frame2) C1 C2 C3 C4 ---- ---- ---- ---- 8 8 8 8 8 8 8 8 
 - 
property names¶
- Gives the column names. - Returns
- the specified column names. 
- Examples
 - >>> iris = h2o.load_dataset("iris") >>> from h2o.assembly import * >>> from h2o.transforms.preprocessing import * >>> assembly = H2OAssembly(steps=[("col_select", ... H2OColSelect(["Sepal.Length", "Petal.Length", "Species"])), ... ("cos_Sepal.Length", ... H2OColOp(op=H2OFrame.cos, col="Sepal.Length", inplace=True)), ... ("str_cnt_Species", ... H2OColOp(op=H2OFrame.countmatches, ... col="Species", ... inplace=False, pattern="s"))]) >>> result = assembly.fit(iris) >>> result.names [u'Sepal.Length', u'Petal.Length', u'Species', u'Species0'] 
 - 
not_equal(rhs)¶
- Measures whether the frames are not equal. - Returns
- boolean true/false response (0/1 = no/yes). 
- Examples
 - >>> python_list1 = [[4,4,4,4],[4,4,4,4]] >>> python_list2 = [[2,2,2,2], [2,2,2,2]] >>> frame1 = h2o.H2OFrame(python_obj=python_list1) >>> frame2 = h2o.H2OFrame(python_obj=python_list2) >>> H2OAssembly.not_equal(frame1, frame2) C1 C2 C3 C4 ---- ---- ---- ---- 1 1 1 1 1 1 1 1 
 - 
plus(rhs)¶
- Adds the frames together. - Returns
- the sum of the frames. 
- Examples
 - >>> python_list1 = [[4,4,4,4],[4,4,4,4]] >>> python_list2 = [[2,2,2,2], [2,2,2,2]] >>> frame1 = h2o.H2OFrame(python_obj=python_list1) >>> frame2 = h2o.H2OFrame(python_obj=python_list2) >>> H2OAssembly.plus(frame1, frame2) C1 C2 C3 C4 ---- ---- ---- ---- 6 6 6 6 6 6 6 6 
 - 
to_pojo(pojo_name='', path='', get_jar=True)[source]¶
- Convert the munging operations performed on H2OFrame into a POJO. - Parameters
- pojo_name – (str) Name of POJO. 
- path – (str) path of POJO. 
- get_jar – (bool) Whether to also download the h2o-genmodel.jar file needed to compile the POJO. 
 
- Returns
- None. 
- Examples
 - >>> from h2o.assembly import * >>> from h2o.transforms.preprocessing import * >>> iris = h2o.load_dataset("iris") >>> assembly = H2OAssembly(steps=[("col_select", ... H2OColSelect(["Sepal.Length", ... "Petal.Length", "Species"])), ... ("cos_Sepal.Length", ... H2OColOp(op=H2OFrame.cos, ... col="Sepal.Length", inplace=True)), ... ("str_cnt_Species", ... H2OColOp(op=H2OFrame.countmatches, ... col="Species", inplace=False, ... pattern="s"))]) >>> result = assembly.fit(iris) >>> assembly.to_pojo(pojo_name="iris_pojo", path='', get_jar=False)