Modeling In H2O¶
Supervised¶
H2OAdaBoostEstimator¶
- 
class h2o.estimators.adaboost.H2OAdaBoostEstimator(model_id=None, training_frame=None, ignored_columns=None, ignore_const_cols=True, categorical_encoding='auto', weights_column=None, nlearners=50, weak_learner='auto', learn_rate=0.5, weak_learner_params=None, seed=-1)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- AdaBoost - Builds an AdaBoost model - 
property categorical_encoding¶
- Encoding scheme for categorical features - Type: - Literal["auto", "enum", "one_hot_internal", "one_hot_explicit", "binary", "eigen", "label_encoder", "sort_by_response", "enum_limited"], defaults to- "auto".
 - 
property ignore_const_cols¶
- Ignore constant columns. - Type: - bool, defaults to- True.
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property learn_rate¶
- Learning rate (from 0.0 to 1.0) - Type: - float, defaults to- 0.5.
 - 
property nlearners¶
- Number of AdaBoost weak learners. - Type: - int, defaults to- 50.
 - 
property seed¶
- Seed for pseudo random number generator (if applicable) - Type: - int, defaults to- -1.
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].
 - 
property weak_learner¶
- Choose a weak learner type. Defaults to AUTO, which means DRF. - Type: - Literal["auto", "drf", "glm", "gbm", "deep_learning"], defaults to- "auto".
 - 
property weak_learner_params¶
- Customized parameters for the weak_learner algorithm. - Type: - dict.- Examples
 - >>> prostate_hex = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv") >>> prostate_hex["CAPSULE"] = prostate_hex["CAPSULE"].asfactor() >>> response = "CAPSULE" >>> seed = 42 >>> adaboost_model = H2OAdaBoostEstimator(seed=seed, ... weak_learner="DRF", ... weak_learner_params={'ntrees':1,'max_depth':3}) >>> adaboost_model.train(y=response, ... ignored_columns=["ID"], ... training_frame=prostate_hex) >>> print(adaboost_model) 
 - 
property weights_column¶
- Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0. - Type: - str.
 
- 
property 
H2OANOVAGLMEstimator¶
- 
class h2o.estimators.anovaglm.H2OANOVAGLMEstimator(model_id=None, training_frame=None, seed=-1, response_column=None, ignored_columns=None, ignore_const_cols=True, score_each_iteration=False, offset_column=None, weights_column=None, family='auto', tweedie_variance_power=0.0, tweedie_link_power=1.0, theta=0.0, solver='irlsm', missing_values_handling='mean_imputation', plug_values=None, compute_p_values=True, standardize=True, non_negative=False, max_iterations=0, link='family_default', prior=0.0, alpha=None, lambda_=[0.0], lambda_search=False, stopping_rounds=0, stopping_metric='auto', early_stopping=False, stopping_tolerance=0.001, balance_classes=False, class_sampling_factors=None, max_after_balance_size=5.0, max_runtime_secs=0.0, save_transformed_framekeys=False, highest_interaction_term=0, nparallelism=4, type=0)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- ANOVA for Generalized Linear Model - H2O ANOVAGLM is used to calculate Type III SS which is used to evaluate the contributions of individual predictors and their interactions to a model. Predictors or interactions with negligible contributions to the model will have high p-values while those with more contributions will have low p-values. - 
property Lambda¶
- DEPRECATED. Use - self.lambda_instead
 - 
property alpha¶
- Distribution of regularization between the L1 (Lasso) and L2 (Ridge) penalties. A value of 1 for alpha represents Lasso regression, a value of 0 produces Ridge regression, and anything in between specifies the amount of mixing between the two. Default value of alpha is 0 when SOLVER = ‘L-BFGS’; 0.5 otherwise. - Type: - List[float].
 - 
property balance_classes¶
- Balance training data class counts via over/under-sampling (for imbalanced data). - Type: - bool, defaults to- False.
 - 
property class_sampling_factors¶
- Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes. - Type: - List[float].
 - 
property compute_p_values¶
- Request p-values computation, p-values work only with IRLSM solver and no regularization - Type: - bool, defaults to- True.
 - 
property early_stopping¶
- Stop early when there is no more relative improvement on train or validation (if provided). - Type: - bool, defaults to- False.
 - 
property family¶
- Family. Use binomial for classification with logistic regression, others are for regression problems. - Type: - Literal["auto", "gaussian", "binomial", "fractionalbinomial", "quasibinomial", "poisson", "gamma", "tweedie", "negativebinomial"], defaults to- "auto".
 - 
property highest_interaction_term¶
- Limit the number of interaction terms, if 2 means interaction between 2 columns only, 3 for three columns and so on… Default to 2. - Type: - int, defaults to- 0.- Examples
 - >>> import h2o >>> h2o.init() >>> from h2o.estimators import H2OANOVAGLMEstimator >>> train = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate_complete.csv.zip") >>> x = ['AGE', 'VOL', 'DCAPS'] >>> y = 'CAPSULE' >>> anova_model = H2OANOVAGLMEstimator(family='binomial', ... lambda_=0, ... missing_values_handling="skip", ... highest_interaction_term=2) >>> anova_model.train(x=x, y=y, training_frame=train) >>> anova_model.summary() 
 - 
property ignore_const_cols¶
- Ignore constant columns. - Type: - bool, defaults to- True.
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property lambda_¶
- Regularization strength - Type: - List[float], defaults to- [0.0].
 - 
property lambda_search¶
- Use lambda search starting at lambda max, given lambda is then interpreted as lambda min - Type: - bool, defaults to- False.
 - 
property link¶
- Link function. - Type: - Literal["family_default", "identity", "logit", "log", "inverse", "tweedie", "ologit"], defaults to- "family_default".- Examples
 - >>> import h2o >>> h2o.init() >>> from h2o.estimators import H2OANOVAGLMEstimator >>> train = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate_complete.csv.zip") >>> x = ['AGE', 'VOL', 'DCAPS'] >>> y = 'CAPSULE' >>> anova_model = H2OANOVAGLMEstimator(family='binomial', ... lambda_=0, ... missing_values_handling="skip", ... link="family_default") >>> anova_model.train(x=x, y=y, training_frame=train) >>> anova_model.summary() 
 - 
property max_after_balance_size¶
- Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes. - Type: - float, defaults to- 5.0.
 - 
property max_iterations¶
- Maximum number of iterations - Type: - int, defaults to- 0.
 - 
property max_runtime_secs¶
- Maximum allowed runtime in seconds for model training. Use 0 to disable. - Type: - float, defaults to- 0.0.
 - 
property missing_values_handling¶
- Handling of missing values. Either MeanImputation, Skip or PlugValues. - Type: - Literal["mean_imputation", "skip", "plug_values"], defaults to- "mean_imputation".
 - 
property non_negative¶
- Restrict coefficients (not intercept) to be non-negative - Type: - bool, defaults to- False.
 - 
property nparallelism¶
- Number of models to build in parallel. Default to 4. Adjust according to your system. - Type: - int, defaults to- 4.
 - 
property offset_column¶
- Offset column. This will be added to the combination of columns before applying the link function. - Type: - str.
 - 
property plug_values¶
- Plug Values (a single row frame containing values that will be used to impute missing values of the training/validation frame, use with conjunction missing_values_handling = PlugValues) - Type: - Union[None, str, H2OFrame].
 - 
property prior¶
- Prior probability for y==1. To be used only for logistic regression iff the data has been sampled and the mean of response does not reflect reality. - Type: - float, defaults to- 0.0.
 - 
property response_column¶
- Response variable column. - Type: - str.
 - 
result()[source]¶
- Get result frame that contains information about the model building process like for modelselection and anovaglm. - Returns
- the H2OFrame that contains information about the model building process like for modelselection and anovaglm. 
 
 - 
property save_transformed_framekeys¶
- true to save the keys of transformed predictors and interaction column. - Type: - bool, defaults to- False.- Examples
 - >>> import h2o >>> h2o.init() >>> from h2o.estimators import H2OANOVAGLMEstimator >>> train = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate_complete.csv.zip") >>> x = ['AGE', 'VOL', 'DCAPS'] >>> y = 'CAPSULE' >>> anova_model = H2OANOVAGLMEstimator(family='binomial', ... lambda_=0, ... missing_values_handling="skip", ... save_transformed_framekeys=True) >>> anova_model.train(x=x, y=y, training_frame=train) >>> transformFrame = h2o.get_frame(anova_model._model_json['output']['transformed_columns_key']['name']) >>> print(transformFrame) 
 - 
property score_each_iteration¶
- Whether to score during each iteration of model training. - Type: - bool, defaults to- False.
 - 
property seed¶
- Seed for pseudo random number generator (if applicable) - Type: - int, defaults to- -1.
 - 
property solver¶
- AUTO will set the solver based on given data and the other parameters. IRLSM is fast on on problems with small number of predictors and for lambda-search with L1 penalty, L_BFGS scales better for datasets with many columns. - Type: - Literal["auto", "irlsm", "l_bfgs", "coordinate_descent_naive", "coordinate_descent", "gradient_descent_lh", "gradient_descent_sqerr"], defaults to- "irlsm".
 - 
property standardize¶
- Standardize numeric columns to have zero mean and unit variance - Type: - bool, defaults to- True.
 - 
property stopping_metric¶
- Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. - Type: - Literal["auto", "deviance", "logloss", "mse", "rmse", "mae", "rmsle", "auc", "aucpr", "lift_top_group", "misclassification", "mean_per_class_error", "custom", "custom_increasing"], defaults to- "auto".
 - 
property stopping_rounds¶
- Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) - Type: - int, defaults to- 0.
 - 
property stopping_tolerance¶
- Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much) - Type: - float, defaults to- 0.001.
 - 
property theta¶
- Theta - Type: - float, defaults to- 0.0.
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].
 - 
property tweedie_link_power¶
- Tweedie link power - Type: - float, defaults to- 1.0.
 - 
property tweedie_variance_power¶
- Tweedie variance power - Type: - float, defaults to- 0.0.
 - 
property type¶
- Refer to the SS type 1, 2, 3, or 4. We are currently only supporting 3 - Type: - int, defaults to- 0.
 - 
property weights_column¶
- Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0. - Type: - str.
 
- 
property 
H2OCoxProportionalHazardsEstimator¶
- 
class h2o.estimators.coxph.H2OCoxProportionalHazardsEstimator(model_id=None, training_frame=None, start_column=None, stop_column=None, response_column=None, ignored_columns=None, weights_column=None, offset_column=None, stratify_by=None, ties='efron', init=0.0, lre_min=9.0, max_iterations=20, interactions=None, interaction_pairs=None, interactions_only=None, use_all_factor_levels=False, export_checkpoints_dir=None, single_node_mode=False)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- Cox Proportional Hazards - Trains a Cox Proportional Hazards Model (CoxPH) on an H2O dataset. - 
property export_checkpoints_dir¶
- Automatically export generated models to this directory. - Type: - str.- Examples
 - >>> import tempfile >>> from os import listdir >>> heart = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/coxph_test/heart.csv") >>> predictor = "age" >>> response = "event" >>> checkpoints_dir = tempfile.mkdtemp() >>> coxph = H2OCoxProportionalHazardsEstimator(start_column="start", ... stop_column="stop", ... export_checkpoints_dir=checkpoints_dir) >>> coxph.train(x=predictor, ... y=response, ... training_frame=heart) >>> len(listdir(checkpoints_dir)) 
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property init¶
- Coefficient starting value. - Type: - float, defaults to- 0.0.- Examples
 - >>> heart = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/coxph_test/heart.csv") >>> predictor = "age" >>> response = "event" >>> heart_coxph = H2OCoxProportionalHazardsEstimator(start_column="start", ... stop_column="stop", ... init=2.9) >>> heart_coxph.train(x=predictor, ... y=response, ... training_frame=heart) >>> heart_coxph.scoring_history() 
 - 
property interaction_pairs¶
- A list of pairwise (first order) column interactions. - Type: - List[tuple].- Examples
 - >>> heart = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/coxph_test/heart.csv") >>> predictor = "age" >>> response = "event" >>> interaction_pairs = [("start","stop")] >>> heart_coxph = H2OCoxProportionalHazardsEstimator(start_column="start", ... stop_column="stop", ... interaction_pairs=interaction_pairs) >>> heart_coxph.train(x=predictor, ... y=response, ... training_frame=heart) >>> heart_coxph.scoring_history() 
 - 
property interactions¶
- A list of predictor column indices to interact. All pairwise combinations will be computed for the list. - Type: - List[str].- Examples
 - >>> heart = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/coxph_test/heart.csv") >>> predictor = "age" >>> response = "event" >>> interactions = ['start','stop'] >>> heart_coxph = H2OCoxProportionalHazardsEstimator(start_column="start", ... stop_column="stop", ... interactions=interactions) >>> heart_coxph.train(x=predictor, ... y=response, ... training_frame=heart) >>> heart_coxph.scoring_history() 
 - 
property interactions_only¶
- A list of columns that should only be used to create interactions but should not itself participate in model training. - Type: - List[str].- Examples
 - >>> heart = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/coxph_test/heart.csv") >>> predictor = "age" >>> response = "event" >>> interactions = ['start','stop'] >>> heart_coxph = H2OCoxProportionalHazardsEstimator(start_column="start", ... stop_column="stop", ... interactions_only=interactions) >>> heart_coxph.train(x=predictor, ... y=response, ... training_frame=heart) >>> heart_coxph.scoring_history() 
 - 
property lre_min¶
- Minimum log-relative error. - Type: - float, defaults to- 9.0.- Examples
 - >>> heart = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/coxph_test/heart.csv") >>> predictor = "age" >>> response = "event" >>> heart_coxph = H2OCoxProportionalHazardsEstimator(start_column="start", ... stop_column="stop", ... lre_min=5) >>> heart_coxph.train(x=predictor, ... y=response, ... training_frame=heart) >>> heart_coxph.scoring_history() 
 - 
property max_iterations¶
- Maximum number of iterations. - Type: - int, defaults to- 20.- Examples
 - >>> heart = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/coxph_test/heart.csv") >>> predictor = "age" >>> response = "event" >>> heart_coxph = H2OCoxProportionalHazardsEstimator(start_column="start", ... stop_column="stop", ... max_iterations=50) >>> heart_coxph.train(x=predictor, ... y=response, ... training_frame=heart) >>> heart_coxph.scoring_history() 
 - 
property offset_column¶
- Offset column. This will be added to the combination of columns before applying the link function. - Type: - str.- Examples
 - >>> heart = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/coxph_test/heart.csv") >>> predictor = "age" >>> response = "event" >>> heart_coxph = H2OCoxProportionalHazardsEstimator(start_column="start", ... stop_column="stop", ... offset_column="transplant") >>> heart_coxph.train(x=predictor, ... y=response, ... training_frame=heart) >>> heart_coxph.scoring_history() 
 - 
property response_column¶
- Response variable column. - Type: - str.
 - 
property single_node_mode¶
- Run on a single node to reduce the effect of network overhead (for smaller datasets) - Type: - bool, defaults to- False.
 - 
property start_column¶
- Start Time Column. - Type: - str.- Examples
 - >>> heart = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/coxph_test/heart.csv") >>> predictor = "age" >>> response = "event" >>> train, valid = heart.split_frame(ratios=[.8]) >>> heart_coxph = H2OCoxProportionalHazardsEstimator(start_column="start", ... stop_column="stop") >>> heart_coxph.train(x=predictor, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> heart_coxph.scoring_history() 
 - 
property stop_column¶
- Stop Time Column. - Type: - str.- Examples
 - >>> heart = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/coxph_test/heart.csv") >>> predictor = "age" >>> response = "event" >>> train, valid = heart.split_frame(ratios=[.8]) >>> heart_coxph = H2OCoxProportionalHazardsEstimator(start_column="start", ... stop_column="stop") >>> heart_coxph.train(x=predictor, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> heart_coxph.scoring_history() 
 - 
property stratify_by¶
- List of columns to use for stratification. - Type: - List[str].
 - 
property ties¶
- Method for Handling Ties. - Type: - Literal["efron", "breslow"], defaults to- "efron".- Examples
 - >>> heart = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/coxph_test/heart.csv") >>> predictor = "age" >>> response = "event" >>> train, valid = heart.split_frame(ratios=[.8]) >>> heart_coxph = H2OCoxProportionalHazardsEstimator(start_column="start", ... stop_column="stop", ... ties="breslow") >>> heart_coxph.train(x=predictor, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> heart_coxph.scoring_history() 
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> heart = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/coxph_test/heart.csv") >>> predictor = "age" >>> response = "event" >>> train, valid = heart.split_frame(ratios=[.8]) >>> heart_coxph = H2OCoxProportionalHazardsEstimator(start_column="start", ... stop_column="stop") >>> heart_coxph.train(x=predictor, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> heart_coxph.scoring_history() 
 - 
property use_all_factor_levels¶
- (Internal. For development only!) Indicates whether to use all factor levels. - Type: - bool, defaults to- False.- Examples
 - >>> heart = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/coxph_test/heart.csv") >>> predictor = "age" >>> response = "event" >>> heart_coxph = H2OCoxProportionalHazardsEstimator(start_column="start", ... stop_column="stop", ... use_all_factor_levels=True) >>> heart_coxph.train(x=predictor, ... y=response, ... training_frame=heart) >>> heart_coxph.scoring_history() 
 - 
property weights_column¶
- Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0. - Type: - str.
 
- 
property 
H2ODecisionTreeEstimator¶
- 
class h2o.estimators.decision_tree.H2ODecisionTreeEstimator(model_id=None, training_frame=None, ignored_columns=None, ignore_const_cols=True, categorical_encoding='auto', response_column=None, seed=-1, max_depth=20, min_rows=10)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- Decision Tree - Builds a Decision Tree (DT) on a preprocessed dataset. - 
property categorical_encoding¶
- Encoding scheme for categorical features - Type: - Literal["auto", "enum", "one_hot_internal", "one_hot_explicit", "binary", "eigen", "label_encoder", "sort_by_response", "enum_limited"], defaults to- "auto".- Examples
 - >>> import h2o >>> from h2o.estimators import H2ODecisionTreeEstimator >>> h2o.init() >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv") >>> target_variable = 'CAPSULE' >>> prostate["RACE"] = prostate["RACE"].asfactor() >>> prostate[target_variable] = prostate[target_variable].asfactor() >>> train, test = prostate.split_frame(ratios=[0.7]) >>> sdt_h2o = H2ODecisionTreeEstimator(model_id="decision_tree.hex", ... max_depth=5, ... categorical_encoding="binary") >>> sdt_h2o.train(y=target_variable, training_frame=train) >>> pred_test = sdt_h2o.predict(test) 
 - 
property ignore_const_cols¶
- Ignore constant columns. - Type: - bool, defaults to- True.- Examples
 - >>> import h2o >>> from h2o.estimators import H2ODecisionTreeEstimator >>> h2o.init() >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv") >>> target_variable = 'CAPSULE' >>> prostate[target_variable] = prostate[target_variable].asfactor() >>> prostate["const_1"] = 6 >>> train, test = prostate.split_frame(ratios=[0.7]) >>> sdt_h2o = H2ODecisionTreeEstimator(model_id="decision_tree.hex", ... max_depth=5, ... ignore_const_cols=True) >>> sdt_h2o.train(y=target_variable, training_frame=train) >>> pred_test = sdt_h2o.predict(test) 
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property max_depth¶
- Max depth of tree. - Type: - int, defaults to- 20.- Examples
 - >>> import h2o >>> from h2o.estimators import H2ODecisionTreeEstimator >>> h2o.init() >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv") >>> target_variable = 'CAPSULE' >>> prostate[target_variable] = prostate[target_variable].asfactor() >>> train, test = prostate.split_frame(ratios=[0.7]) >>> sdt_h2o = H2ODecisionTreeEstimator(model_id="decision_tree.hex", ... max_depth=5) >>> sdt_h2o.train(y=target_variable, training_frame=train) >>> pred_test = sdt_h2o.predict(test) 
 - 
property min_rows¶
- Fewest allowed (weighted) observations in a leaf. - Type: - int, defaults to- 10.- Examples
 - >>> import h2o >>> from h2o.estimators import H2ODecisionTreeEstimator >>> h2o.init() >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv") >>> target_variable = 'CAPSULE' >>> prostate[target_variable] = prostate[target_variable].asfactor() >>> train, test = prostate.split_frame(ratios=[0.7]) >>> sdt_h2o = H2ODecisionTreeEstimator(model_id="decision_tree.hex", ... max_depth=5, ... min_rows=20) >>> sdt_h2o.train(y=target_variable, training_frame=train) >>> pred_test = sdt_h2o.predict(test) 
 - 
property response_column¶
- Response variable column. - Type: - str.
 - 
property seed¶
- Seed for random numbers (affects sampling) - Type: - int, defaults to- -1.
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].
 
- 
property 
H2ODeepLearningEstimator¶
- 
class h2o.estimators.deeplearning.H2ODeepLearningEstimator(model_id=None, training_frame=None, validation_frame=None, nfolds=0, keep_cross_validation_models=True, keep_cross_validation_predictions=False, keep_cross_validation_fold_assignment=False, fold_assignment='auto', fold_column=None, response_column=None, ignored_columns=None, ignore_const_cols=True, score_each_iteration=False, weights_column=None, offset_column=None, balance_classes=False, class_sampling_factors=None, max_after_balance_size=5.0, max_confusion_matrix_size=20, checkpoint=None, pretrained_autoencoder=None, overwrite_with_best_model=True, use_all_factor_levels=True, standardize=True, activation='rectifier', hidden=[200, 200], epochs=10.0, train_samples_per_iteration=-2, target_ratio_comm_to_comp=0.05, seed=-1, adaptive_rate=True, rho=0.99, epsilon=1e-08, rate=0.005, rate_annealing=1e-06, rate_decay=1.0, momentum_start=0.0, momentum_ramp=1000000.0, momentum_stable=0.0, nesterov_accelerated_gradient=True, input_dropout_ratio=0.0, hidden_dropout_ratios=None, l1=0.0, l2=0.0, max_w2=3.4028235e+38, initial_weight_distribution='uniform_adaptive', initial_weight_scale=1.0, initial_weights=None, initial_biases=None, loss='automatic', distribution='auto', quantile_alpha=0.5, tweedie_power=1.5, huber_alpha=0.9, score_interval=5.0, score_training_samples=10000, score_validation_samples=0, score_duty_cycle=0.1, classification_stop=0.0, regression_stop=1e-06, stopping_rounds=5, stopping_metric='auto', stopping_tolerance=0.0, max_runtime_secs=0.0, score_validation_sampling='uniform', diagnostics=True, fast_mode=True, force_load_balance=True, variable_importances=True, replicate_training_data=True, single_node_mode=False, shuffle_training_data=False, missing_values_handling='mean_imputation', quiet_mode=False, autoencoder=False, sparse=False, col_major=False, average_activation=0.0, sparsity_beta=0.0, max_categorical_features=2147483647, reproducible=False, export_weights_and_biases=False, mini_batch_size=1, categorical_encoding='auto', elastic_averaging=False, elastic_averaging_moving_rate=0.9, elastic_averaging_regularization=0.001, export_checkpoints_dir=None, auc_type='auto', custom_metric_func=None, gainslift_bins=-1)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- Deep Learning - Build a Deep Neural Network model using CPUs Builds a feed-forward multilayer artificial neural network on an H2OFrame - Examples
 - >>> from h2o.estimators.deeplearning import H2ODeepLearningEstimator >>> rows = [[1,2,3,4,0], [2,1,2,4,1], [2,1,4,2,1], ... [0,1,2,34,1], [2,3,4,1,0]] * 50 >>> fr = h2o.H2OFrame(rows) >>> fr[4] = fr[4].asfactor() >>> model = H2ODeepLearningEstimator() >>> model.train(x=range(4), y=4, training_frame=fr) >>> model.logloss() - 
property activation¶
- Activation function. - Type: - Literal["tanh", "tanh_with_dropout", "rectifier", "rectifier_with_dropout", "maxout", "maxout_with_dropout"], defaults to- "rectifier".- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> cars_dl = H2ODeepLearningEstimator(activation="tanh") >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.mse() 
 - 
property adaptive_rate¶
- Adaptive learning rate. - Type: - bool, defaults to- True.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> cars_dl = H2ODeepLearningEstimator(adaptive_rate=True) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.mse() 
 - 
property auc_type¶
- Set default multinomial AUC type. - Type: - Literal["auto", "none", "macro_ovr", "weighted_ovr", "macro_ovo", "weighted_ovo"], defaults to- "auto".
 - 
property autoencoder¶
- Auto-Encoder. - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> cars_dl = H2ODeepLearningEstimator(autoencoder=True) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.mse() 
 - 
property average_activation¶
- Average activation for sparse auto-encoder. #Experimental - Type: - float, defaults to- 0.0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> cars_dl = H2ODeepLearningEstimator(average_activation=1.5, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.mse() 
 - 
property balance_classes¶
- Balance training data class counts via over/under-sampling (for imbalanced data). - Type: - bool, defaults to- False.- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> train, valid = covtype.split_frame(ratios=[.8], seed=1234) >>> cov_dl = H2ODeepLearningEstimator(balance_classes=True, ... seed=1234) >>> cov_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cov_dl.mse() 
 - 
property categorical_encoding¶
- Encoding scheme for categorical features - Type: - Literal["auto", "enum", "one_hot_internal", "one_hot_explicit", "binary", "eigen", "label_encoder", "sort_by_response", "enum_limited"], defaults to- "auto".- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"]= airlines["Year"].asfactor() >>> airlines["Month"]= airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> encoding = "one_hot_internal" >>> airlines_dl = H2ODeepLearningEstimator(categorical_encoding=encoding, ... seed=1234) >>> airlines_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_dl.mse() 
 - 
property checkpoint¶
- Model checkpoint to resume training with. - Type: - Union[None, str, H2OEstimator].- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(activation="tanh", ... autoencoder=True, ... seed=1234, ... model_id="cars_dl") >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.mse() >>> cars_cont = H2ODeepLearningEstimator(checkpoint=cars_dl, ... seed=1234) >>> cars_cont.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_cont.mse() 
 - 
property class_sampling_factors¶
- Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes. - Type: - List[float].- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> train, valid = covtype.split_frame(ratios=[.8], seed=1234) >>> sample_factors = [1., 0.5, 1., 1., 1., 1., 1.] >>> cars_dl = H2ODeepLearningEstimator(balance_classes=True, ... class_sampling_factors=sample_factors, ... seed=1234) >>> cov_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cov_dl.mse() 
 - 
property classification_stop¶
- Stopping criterion for classification error fraction on training data (-1 to disable). - Type: - float, defaults to- 0.0.- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> train, valid = covtype.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(classification_stop=1.5, ... seed=1234) >>> cov_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cov_dl.mse() 
 - 
property col_major¶
- #DEPRECATED Use a column major weight matrix for input layer. Can speed up forward propagation, but might slow down backpropagation. - Type: - bool, defaults to- False.
 - 
property custom_metric_func¶
- Reference to custom evaluation function, format: language:keyName=funcName - Type: - str.
 - 
property diagnostics¶
- Enable diagnostics for hidden layers. - Type: - bool, defaults to- True.- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> train, valid = covtype.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(diagnostics=True, ... seed=1234) >>> cov_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cov_dl.mse() 
 - 
property distribution¶
- Distribution function - Type: - Literal["auto", "bernoulli", "multinomial", "gaussian", "poisson", "gamma", "tweedie", "laplace", "quantile", "huber"], defaults to- "auto".- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(distribution="poisson", ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.mse() 
 - 
property elastic_averaging¶
- Elastic averaging between compute nodes can improve distributed model convergence. #Experimental - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(elastic_averaging=True, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.mse() 
 - 
property elastic_averaging_moving_rate¶
- Elastic averaging moving rate (only if elastic averaging is enabled). - Type: - float, defaults to- 0.9.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(elastic_averaging_moving_rate=.8, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.mse() 
 - 
property elastic_averaging_regularization¶
- Elastic averaging regularization strength (only if elastic averaging is enabled). - Type: - float, defaults to- 0.001.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(elastic_averaging_regularization=.008, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.mse() 
 - 
property epochs¶
- How many times the dataset should be iterated (streamed), can be fractional. - Type: - float, defaults to- 10.0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(epochs=15, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.mse() 
 - 
property epsilon¶
- Adaptive learning rate smoothing factor (to avoid divisions by zero and allow progress). - Type: - float, defaults to- 1e-08.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(epsilon=1e-6, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.mse() 
 - 
property export_checkpoints_dir¶
- Automatically export generated models to this directory. - Type: - str.- Examples
 - >>> import tempfile >>> from os import listdir >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> checkpoints_dir = tempfile.mkdtemp() >>> cars_dl = H2ODeepLearningEstimator(export_checkpoints_dir=checkpoints_dir, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> len(listdir(checkpoints_dir)) 
 - 
property export_weights_and_biases¶
- Whether to export Neural Network weights and biases to H2O Frames. - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(export_weights_and_biases=True, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.mse() 
 - 
property fast_mode¶
- Enable fast mode (minor approximation in back-propagation). - Type: - bool, defaults to- True.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(fast_mode=False, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.mse() 
 - 
property fold_assignment¶
- Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems. - Type: - Literal["auto", "random", "modulo", "stratified"], defaults to- "auto".- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(fold_assignment="Random", ... nfolds=5, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.mse() 
 - 
property fold_column¶
- Column with cross-validation fold index assignment per observation. - Type: - str.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> fold_numbers = cars.kfold_column(n_folds=5, seed=1234) >>> fold_numbers.set_names(["fold_numbers"]) >>> cars = cars.cbind(fold_numbers) >>> print(cars['fold_numbers']) >>> cars_dl = H2ODeepLearningEstimator(seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=cars, ... fold_column="fold_numbers") >>> cars_dl.mse() 
 - 
property force_load_balance¶
- Force extra load balancing to increase training speed for small datasets (to keep all cores busy). - Type: - bool, defaults to- True.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(force_load_balance=False, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.mse() 
 - 
property gainslift_bins¶
- Gains/Lift table number of bins. 0 means disabled.. Default value -1 means automatic binning. - Type: - int, defaults to- -1.
 - Hidden layer sizes (e.g. [100, 100]). - Type: - List[int], defaults to- [200, 200].- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(hidden=[100,100], ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.mse() 
 - Hidden layer dropout ratios (can improve generalization), specify one value per hidden layer, defaults to 0.5. - Type: - List[float].- Examples
 - >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/bigdata/laptop/mnist/train.csv.gz") >>> valid = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/bigdata/laptop/mnist/test.csv.gz") >>> features = list(range(0,784)) >>> target = 784 >>> train[target] = train[target].asfactor() >>> valid[target] = valid[target].asfactor() >>> model = H2ODeepLearningEstimator(epochs=20, ... hidden=[200,200], ... hidden_dropout_ratios=[0.5,0.5], ... seed=1234, ... activation='tanhwithdropout') >>> model.train(x=features, ... y=target, ... training_frame=train, ... validation_frame=valid) >>> model.mse() 
 - 
property huber_alpha¶
- Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1). - Type: - float, defaults to- 0.9.- Examples
 - >>> insurance = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/insurance.csv") >>> predictors = insurance.columns[0:4] >>> response = 'Claims' >>> insurance['Group'] = insurance['Group'].asfactor() >>> insurance['Age'] = insurance['Age'].asfactor() >>> train, valid = insurance.split_frame(ratios=[.8], seed=1234) >>> insurance_dl = H2ODeepLearningEstimator(distribution="huber", ... huber_alpha=0.9, ... seed=1234) >>> insurance_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> insurance_dl.mse() 
 - 
property ignore_const_cols¶
- Ignore constant columns. - Type: - bool, defaults to- True.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> cars["const_1"] = 6 >>> cars["const_2"] = 7 >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(seed=1234, ... ignore_const_cols=True) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.auc() 
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property initial_biases¶
- A list of H2OFrame ids to initialize the bias vectors of this model with. - Type: - List[Union[None, str, H2OFrame]].- Examples
 - >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris.csv") >>> dl1 = H2ODeepLearningEstimator(hidden=[10,10], ... export_weights_and_biases=True) >>> dl1.train(x=list(range(4)), y=4, training_frame=iris) >>> p1 = dl1.model_performance(iris).logloss() >>> ll1 = dl1.predict(iris) >>> print(p1) >>> w1 = dl1.weights(0) >>> w2 = dl1.weights(1) >>> w3 = dl1.weights(2) >>> b1 = dl1.biases(0) >>> b2 = dl1.biases(1) >>> b3 = dl1.biases(2) >>> dl2 = H2ODeepLearningEstimator(hidden=[10,10], ... initial_weights=[w1, w2, w3], ... initial_biases=[b1, b2, b3], ... epochs=0) >>> dl2.train(x=list(range(4)), y=4, training_frame=iris) >>> dl2.initial_biases 
 - 
property initial_weight_distribution¶
- Initial weight distribution. - Type: - Literal["uniform_adaptive", "uniform", "normal"], defaults to- "uniform_adaptive".- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(initial_weight_distribution="Uniform", ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.auc() 
 - 
property initial_weight_scale¶
- Uniform: -value…value, Normal: stddev. - Type: - float, defaults to- 1.0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(initial_weight_scale=1.5, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.auc() 
 - 
property initial_weights¶
- A list of H2OFrame ids to initialize the weight matrices of this model with. - Type: - List[Union[None, str, H2OFrame]].- Examples
 - >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris.csv") >>> dl1 = H2ODeepLearningEstimator(hidden=[10,10], ... export_weights_and_biases=True) >>> dl1.train(x=list(range(4)), y=4, training_frame=iris) >>> p1 = dl1.model_performance(iris).logloss() >>> ll1 = dl1.predict(iris) >>> print(p1) >>> w1 = dl1.weights(0) >>> w2 = dl1.weights(1) >>> w3 = dl1.weights(2) >>> b1 = dl1.biases(0) >>> b2 = dl1.biases(1) >>> b3 = dl1.biases(2) >>> dl2 = H2ODeepLearningEstimator(hidden=[10,10], ... initial_weights=[w1, w2, w3], ... initial_biases=[b1, b2, b3], ... epochs=0) >>> dl2.train(x=list(range(4)), y=4, training_frame=iris) >>> dl2.initial_weights 
 - 
property input_dropout_ratio¶
- Input layer dropout ratio (can improve generalization, try 0.1 or 0.2). - Type: - float, defaults to- 0.0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(input_dropout_ratio=0.2, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.auc() 
 - 
property keep_cross_validation_fold_assignment¶
- Whether to keep the cross-validation fold assignment. - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> cars_dl = H2ODeepLearningEstimator(keep_cross_validation_fold_assignment=True, ... nfolds=5, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=cars) >>> print(cars_dl.cross_validation_fold_assignment()) 
 - 
property keep_cross_validation_models¶
- Whether to keep the cross-validation models. - Type: - bool, defaults to- True.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> cars_dl = H2ODeepLearningEstimator(keep_cross_validation_models=True, ... nfolds=5, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=cars) >>> print(cars_dl.cross_validation_models()) 
 - 
property keep_cross_validation_predictions¶
- Whether to keep the predictions of the cross-validation models. - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> cars_dl = H2ODeepLearningEstimator(keep_cross_validation_predictions=True, ... nfolds=5, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=cars) >>> print(cars_dl.cross_validation_predictions()) 
 - 
property l1¶
- L1 regularization (can add stability and improve generalization, causes many weights to become 0). - Type: - float, defaults to- 0.0.- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> hh_imbalanced = H2ODeepLearningEstimator(l1=1e-5, ... activation="Rectifier", ... loss="CrossEntropy", ... hidden=[200,200], ... epochs=1, ... balance_classes=False, ... reproducible=True, ... seed=1234) >>> hh_imbalanced.train(x=list(range(54)),y=54, training_frame=covtype) >>> hh_imbalanced.mse() 
 - 
property l2¶
- L2 regularization (can add stability and improve generalization, causes many weights to be small. - Type: - float, defaults to- 0.0.- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> hh_imbalanced = H2ODeepLearningEstimator(l2=1e-5, ... activation="Rectifier", ... loss="CrossEntropy", ... hidden=[200,200], ... epochs=1, ... balance_classes=False, ... reproducible=True, ... seed=1234) >>> hh_imbalanced.train(x=list(range(54)),y=54, training_frame=covtype) >>> hh_imbalanced.mse() 
 - 
property loss¶
- Loss function. - Type: - Literal["automatic", "cross_entropy", "quadratic", "huber", "absolute", "quantile"], defaults to- "automatic".- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> hh_imbalanced = H2ODeepLearningEstimator(l1=1e-5, ... activation="Rectifier", ... loss="CrossEntropy", ... hidden=[200,200], ... epochs=1, ... balance_classes=False, ... reproducible=True, ... seed=1234) >>> hh_imbalanced.train(x=list(range(54)),y=54, training_frame=covtype) >>> hh_imbalanced.mse() 
 - 
property max_after_balance_size¶
- Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes. - Type: - float, defaults to- 5.0.- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> train, valid = covtype.split_frame(ratios=[.8], seed=1234) >>> max = .85 >>> cov_dl = H2ODeepLearningEstimator(balance_classes=True, ... max_after_balance_size=max, ... seed=1234) >>> cov_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cov_dl.logloss() 
 - 
property max_categorical_features¶
- Max. number of categorical features, enforced via hashing. #Experimental - Type: - int, defaults to- 2147483647.- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> train, valid = covtype.split_frame(ratios=[.8], seed=1234) >>> cov_dl = H2ODeepLearningEstimator(balance_classes=True, ... max_categorical_features=2147483647, ... seed=1234) >>> cov_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cov_dl.logloss() 
 - 
property max_confusion_matrix_size¶
- [Deprecated] Maximum size (# classes) for confusion matrices to be printed in the Logs. - Type: - int, defaults to- 20.
 - 
property max_runtime_secs¶
- Maximum allowed runtime in seconds for model training. Use 0 to disable. - Type: - float, defaults to- 0.0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(max_runtime_secs=10, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.auc() 
 - 
property max_w2¶
- Constraint for squared sum of incoming weights per unit (e.g. for Rectifier). - Type: - float, defaults to- 3.4028235e+38.- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> train, valid = covtype.split_frame(ratios=[.8], seed=1234) >>> cov_dl = H2ODeepLearningEstimator(activation="RectifierWithDropout", ... hidden=[10,10], ... epochs=10, ... input_dropout_ratio=0.2, ... l1=1e-5, ... max_w2=10.5, ... stopping_rounds=0) >>> cov_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cov_dl.mse() 
 - 
property mini_batch_size¶
- Mini-batch size (smaller leads to better fit, larger can speed up and generalize better). - Type: - int, defaults to- 1.- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> train, valid = covtype.split_frame(ratios=[.8], seed=1234) >>> cov_dl = H2ODeepLearningEstimator(activation="RectifierWithDropout", ... hidden=[10,10], ... epochs=10, ... input_dropout_ratio=0.2, ... l1=1e-5, ... max_w2=10.5, ... stopping_rounds=0) ... mini_batch_size=35 >>> cov_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cov_dl.mse() 
 - 
property missing_values_handling¶
- Handling of missing values. Either MeanImputation or Skip. - Type: - Literal["mean_imputation", "skip"], defaults to- "mean_imputation".- Examples
 - >>> boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") >>> predictors = boston.columns[:-1] >>> response = "medv" >>> boston['chas'] = boston['chas'].asfactor() >>> boston.insert_missing_values() >>> train, valid = boston.split_frame(ratios=[.8]) >>> boston_dl = H2ODeepLearningEstimator(missing_values_handling="skip") >>> boston_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> boston_dl.mse() 
 - 
property momentum_ramp¶
- Number of training samples for which momentum increases. - Type: - float, defaults to- 1000000.0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> predictors = ["Year","Month","DayofMonth","DayOfWeek","CRSDepTime", ... "CRSArrTime","UniqueCarrier","FlightNum"] >>> response_col = "IsDepDelayed" >>> airlines_dl = H2ODeepLearningEstimator(hidden=[200,200], ... activation="Rectifier", ... input_dropout_ratio=0.0, ... momentum_start=0.9, ... momentum_stable=0.99, ... momentum_ramp=1e7, ... epochs=100, ... stopping_rounds=4, ... train_samples_per_iteration=30000, ... mini_batch_size=32, ... score_duty_cycle=0.25, ... score_interval=1) >>> airlines_dl.train(x=predictors, ... y=response_col, ... training_frame=airlines) >>> airlines_dl.mse() 
 - 
property momentum_stable¶
- Final momentum after the ramp is over (try 0.99). - Type: - float, defaults to- 0.0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> predictors = ["Year","Month","DayofMonth","DayOfWeek","CRSDepTime", ... "CRSArrTime","UniqueCarrier","FlightNum"] >>> response_col = "IsDepDelayed" >>> airlines_dl = H2ODeepLearningEstimator(hidden=[200,200], ... activation="Rectifier", ... input_dropout_ratio=0.0, ... momentum_start=0.9, ... momentum_stable=0.99, ... momentum_ramp=1e7, ... epochs=100, ... stopping_rounds=4, ... train_samples_per_iteration=30000, ... mini_batch_size=32, ... score_duty_cycle=0.25, ... score_interval=1) >>> airlines_dl.train(x=predictors, ... y=response_col, ... training_frame=airlines) >>> airlines_dl.mse() 
 - 
property momentum_start¶
- Initial momentum at the beginning of training (try 0.5). - Type: - float, defaults to- 0.0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> predictors = ["Year","Month","DayofMonth","DayOfWeek","CRSDepTime", ... "CRSArrTime","UniqueCarrier","FlightNum"] >>> response_col = "IsDepDelayed" >>> airlines_dl = H2ODeepLearningEstimator(hidden=[200,200], ... activation="Rectifier", ... input_dropout_ratio=0.0, ... momentum_start=0.9, ... momentum_stable=0.99, ... momentum_ramp=1e7, ... epochs=100, ... stopping_rounds=4, ... train_samples_per_iteration=30000, ... mini_batch_size=32, ... score_duty_cycle=0.25, ... score_interval=1) >>> airlines_dl.train(x=predictors, ... y=response_col, ... training_frame=airlines) >>> airlines_dl.mse() 
 - 
property nesterov_accelerated_gradient¶
- Use Nesterov accelerated gradient (recommended). - Type: - bool, defaults to- True.- Examples
 - >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/bigdata/laptop/mnist/train.csv.gz") >>> test = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/bigdata/laptop/mnist/test.csv.gz") >>> predictors = list(range(0,784)) >>> resp = 784 >>> train[resp] = train[resp].asfactor() >>> test[resp] = test[resp].asfactor() >>> nclasses = train[resp].nlevels()[0] >>> model = H2ODeepLearningEstimator(activation="RectifierWithDropout", ... adaptive_rate=False, ... rate=0.01, ... rate_decay=0.9, ... rate_annealing=1e-6, ... momentum_start=0.95, ... momentum_ramp=1e5, ... momentum_stable=0.99, ... nesterov_accelerated_gradient=False, ... input_dropout_ratio=0.2, ... train_samples_per_iteration=20000, ... classification_stop=-1, ... l1=1e-5) >>> model.train (x=predictors, ... y=resp, ... training_frame=train, ... validation_frame=test) >>> model.model_performance() 
 - 
property nfolds¶
- Number of folds for K-fold cross-validation (0 to disable or >= 2). - Type: - int, defaults to- 0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> cars_dl = H2ODeepLearningEstimator(nfolds=5, seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=cars) >>> cars_dl.auc() 
 - 
property offset_column¶
- Offset column. This will be added to the combination of columns before applying the link function. - Type: - str.- Examples
 - >>> boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") >>> predictors = boston.columns[:-1] >>> response = "medv" >>> boston['chas'] = boston['chas'].asfactor() >>> boston["offset"] = boston["medv"].log() >>> train, valid = boston.split_frame(ratios=[.8], seed=1234) >>> boston_dl = H2ODeepLearningEstimator(offset_column="offset", ... seed=1234) >>> boston_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> boston_dl.mse() 
 - 
property overwrite_with_best_model¶
- If enabled, override the final model with the best model found during training. - Type: - bool, defaults to- True.- Examples
 - >>> boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") >>> predictors = boston.columns[:-1] >>> response = "medv" >>> boston['chas'] = boston['chas'].asfactor() >>> boston["offset"] = boston["medv"].log() >>> train, valid = boston.split_frame(ratios=[.8], seed=1234) >>> boston_dl = H2ODeepLearningEstimator(overwrite_with_best_model=True, ... seed=1234) >>> boston_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> boston_dl.mse() 
 - 
property pretrained_autoencoder¶
- Pretrained autoencoder model to initialize this model with. - Type: - Union[None, str, H2OEstimator].- Examples
 - >>> from h2o.estimators.deeplearning import H2OAutoEncoderEstimator >>> resp = 784 >>> nfeatures = 20 >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/bigdata/laptop/mnist/train.csv.gz") >>> test = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/bigdata/laptop/mnist/test.csv.gz") >>> train[resp] = train[resp].asfactor() >>> test[resp] = test[resp].asfactor() >>> sid = train[0].runif(0) >>> train_unsupervised = train[sid>=0.5] >>> train_unsupervised.pop(resp) >>> train_supervised = train[sid<0.5] >>> ae_model = H2OAutoEncoderEstimator(activation="Tanh", ... hidden=[nfeatures], ... model_id="ae_model", ... epochs=1, ... ignore_const_cols=False, ... reproducible=True, ... seed=1234) >>> ae_model.train(list(range(resp)), training_frame=train_unsupervised) >>> ae_model.mse() >>> pretrained_model = H2ODeepLearningEstimator(activation="Tanh", ... hidden=[nfeatures], ... epochs=1, ... reproducible = True, ... seed=1234, ... ignore_const_cols=False, ... pretrained_autoencoder="ae_model") >>> pretrained_model.train(list(range(resp)), resp, ... training_frame=train_supervised, ... validation_frame=test) >>> pretrained_model.mse() 
 - 
property quantile_alpha¶
- Desired quantile for Quantile regression, must be between 0 and 1. - Type: - float, defaults to- 0.5.- Examples
 - >>> boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") >>> predictors = boston.columns[:-1] >>> response = "medv" >>> boston['chas'] = boston['chas'].asfactor() >>> train, valid = boston.split_frame(ratios=[.8], seed=1234) >>> boston_dl = H2ODeepLearningEstimator(distribution="quantile", ... quantile_alpha=.8, ... seed=1234) >>> boston_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> boston_dl.mse() 
 - 
property quiet_mode¶
- Enable quiet mode for less output to standard output. - Type: - bool, defaults to- False.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> del predictors[1:3] >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], seed=1234) >>> titanic_dl = H2ODeepLearningEstimator(quiet_mode=True, ... seed=1234) >>> titanic_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> titanic_dl.mse() 
 - 
property rate¶
- Learning rate (higher => less stable, lower => slower convergence). - Type: - float, defaults to- 0.005.- Examples
 - >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/bigdata/laptop/mnist/train.csv.gz") >>> test = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/bigdata/laptop/mnist/test.csv.gz") >>> predictors = list(range(0,784)) >>> resp = 784 >>> train[resp] = train[resp].asfactor() >>> test[resp] = test[resp].asfactor() >>> nclasses = train[resp].nlevels()[0] >>> model = H2ODeepLearningEstimator(activation="RectifierWithDropout", ... adaptive_rate=False, ... rate=0.01, ... rate_decay=0.9, ... rate_annealing=1e-6, ... momentum_start=0.95, ... momentum_ramp=1e5, ... momentum_stable=0.99, ... nesterov_accelerated_gradient=False, ... input_dropout_ratio=0.2, ... train_samples_per_iteration=20000, ... classification_stop=-1, ... l1=1e-5) >>> model.train (x=predictors,y=resp, training_frame=train, validation_frame=test) >>> model.model_performance(valid=True) 
 - 
property rate_annealing¶
- Learning rate annealing: rate / (1 + rate_annealing * samples). - Type: - float, defaults to- 1e-06.- Examples
 - >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/bigdata/laptop/mnist/train.csv.gz") >>> test = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/bigdata/laptop/mnist/test.csv.gz") >>> predictors = list(range(0,784)) >>> resp = 784 >>> train[resp] = train[resp].asfactor() >>> test[resp] = test[resp].asfactor() >>> nclasses = train[resp].nlevels()[0] >>> model = H2ODeepLearningEstimator(activation="RectifierWithDropout", ... adaptive_rate=False, ... rate=0.01, ... rate_decay=0.9, ... rate_annealing=1e-6, ... momentum_start=0.95, ... momentum_ramp=1e5, ... momentum_stable=0.99, ... nesterov_accelerated_gradient=False, ... input_dropout_ratio=0.2, ... train_samples_per_iteration=20000, ... classification_stop=-1, ... l1=1e-5) >>> model.train (x=predictors, ... y=resp, ... training_frame=train, ... validation_frame=test) >>> model.mse() 
 - 
property rate_decay¶
- Learning rate decay factor between layers (N-th layer: rate * rate_decay ^ (n - 1). - Type: - float, defaults to- 1.0.- Examples
 - >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/bigdata/laptop/mnist/train.csv.gz") >>> test = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/bigdata/laptop/mnist/test.csv.gz") >>> predictors = list(range(0,784)) >>> resp = 784 >>> train[resp] = train[resp].asfactor() >>> test[resp] = test[resp].asfactor() >>> nclasses = train[resp].nlevels()[0] >>> model = H2ODeepLearningEstimator(activation="RectifierWithDropout", ... adaptive_rate=False, ... rate=0.01, ... rate_decay=0.9, ... rate_annealing=1e-6, ... momentum_start=0.95, ... momentum_ramp=1e5, ... momentum_stable=0.99, ... nesterov_accelerated_gradient=False, ... input_dropout_ratio=0.2, ... train_samples_per_iteration=20000, ... classification_stop=-1, ... l1=1e-5) >>> model.train (x=predictors, ... y=resp, ... training_frame=train, ... validation_frame=test) >>> model.model_performance() 
 - 
property regression_stop¶
- Stopping criterion for regression error (MSE) on training data (-1 to disable). - Type: - float, defaults to- 1e-06.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"]= airlines["Year"].asfactor() >>> airlines["Month"]= airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_dl = H2ODeepLearningEstimator(regression_stop=1e-6, ... seed=1234) >>> airlines_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_dl.auc() 
 - 
property replicate_training_data¶
- Replicate the entire training dataset onto every node for faster training on small datasets. - Type: - bool, defaults to- True.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"]= airlines["Year"].asfactor() >>> airlines["Month"]= airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> airlines_dl = H2ODeepLearningEstimator(replicate_training_data=False) >>> airlines_dl.train(x=predictors, ... y=response, ... training_frame=airlines) >>> airlines_dl.auc() 
 - 
property reproducible¶
- Force reproducibility on small data (will be slow - only uses 1 thread). - Type: - bool, defaults to- False.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"]= airlines["Year"].asfactor() >>> airlines["Month"]= airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_dl = H2ODeepLearningEstimator(reproducible=True) >>> airlines_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_dl.auc() 
 - 
property response_column¶
- Response variable column. - Type: - str.
 - 
property rho¶
- Adaptive learning rate time decay factor (similarity to prior updates). - Type: - float, defaults to- 0.99.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> cars_dl = H2ODeepLearningEstimator(rho=0.9, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=cars) >>> cars_dl.auc() 
 - 
property score_duty_cycle¶
- Maximum duty cycle fraction for scoring (lower: more training, higher: more scoring). - Type: - float, defaults to- 0.1.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> cars_dl = H2ODeepLearningEstimator(score_duty_cycle=0.2, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=cars) >>> cars_dl.auc() 
 - 
property score_each_iteration¶
- Whether to score during each iteration of model training. - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> cars_dl = H2ODeepLearningEstimator(score_each_iteration=True, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=cars) >>> cars_dl.auc() 
 - 
property score_interval¶
- Shortest time interval (in seconds) between model scoring. - Type: - float, defaults to- 5.0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> cars_dl = H2ODeepLearningEstimator(score_interval=3, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=cars) >>> cars_dl.auc() 
 - 
property score_training_samples¶
- Number of training set samples for scoring (0 for all). - Type: - int, defaults to- 10000.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> cars_dl = H2ODeepLearningEstimator(score_training_samples=10000, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=cars) >>> cars_dl.auc() 
 - 
property score_validation_samples¶
- Number of validation set samples for scoring (0 for all). - Type: - int, defaults to- 0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(score_validation_samples=3, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.auc() 
 - 
property score_validation_sampling¶
- Method used to sample validation dataset for scoring. - Type: - Literal["uniform", "stratified"], defaults to- "uniform".- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(score_validation_sampling="uniform", ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.auc() 
 - 
property seed¶
- Seed for random numbers (affects sampling) - Note: only reproducible when running single threaded. - Type: - int, defaults to- -1.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.auc() 
 - 
property shuffle_training_data¶
- Enable shuffling of training data (recommended if training data is replicated and train_samples_per_iteration is close to #nodes x #rows, of if using balance_classes). - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(shuffle_training_data=True, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=cars) >>> cars_dl.auc() 
 - 
property single_node_mode¶
- Run on a single node for fine-tuning of model parameters. - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(single_node_mode=True, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=cars) >>> cars_dl.auc() 
 - 
property sparse¶
- Sparse data handling (more efficient for data with lots of 0 values). - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(sparse=True, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=cars) >>> cars_dl.auc() 
 - 
property sparsity_beta¶
- Sparsity regularization. #Experimental - Type: - float, defaults to- 0.0.- Examples
 - >>> from h2o.estimators import H2OAutoEncoderEstimator >>> resp = 784 >>> nfeatures = 20 >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/bigdata/laptop/mnist/train.csv.gz") >>> test = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/bigdata/laptop/mnist/test.csv.gz") >>> train[resp] = train[resp].asfactor() >>> test[resp] = test[resp].asfactor() >>> sid = train[0].runif(0) >>> train_unsupervised = train[sid>=0.5] >>> train_unsupervised.pop(resp) >>> ae_model = H2OAutoEncoderEstimator(activation="Tanh", ... hidden=[nfeatures], ... epochs=1, ... ignore_const_cols=False, ... reproducible=True, ... sparsity_beta=0.5, ... seed=1234) >>> ae_model.train(list(range(resp)), ... training_frame=train_unsupervised) >>> ae_model.mse() 
 - 
property standardize¶
- If enabled, automatically standardize the data. If disabled, the user must provide properly scaled input data. - Type: - bool, defaults to- True.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> cars_dl = H2ODeepLearningEstimator(standardize=True, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=cars) >>> cars_dl.auc() 
 - 
property stopping_metric¶
- Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. - Type: - Literal["auto", "deviance", "logloss", "mse", "rmse", "mae", "rmsle", "auc", "aucpr", "lift_top_group", "misclassification", "mean_per_class_error", "custom", "custom_increasing"], defaults to- "auto".- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"]= airlines["Year"].asfactor() >>> airlines["Month"]= airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_dl = H2ODeepLearningEstimator(stopping_metric="auc", ... stopping_rounds=3, ... stopping_tolerance=1e-2, ... seed=1234) >>> airlines_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_dl.auc() 
 - 
property stopping_rounds¶
- Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) - Type: - int, defaults to- 5.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"]= airlines["Year"].asfactor() >>> airlines["Month"]= airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_dl = H2ODeepLearningEstimator(stopping_metric="auc", ... stopping_rounds=3, ... stopping_tolerance=1e-2, ... seed=1234) >>> airlines_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_dl.auc() 
 - 
property stopping_tolerance¶
- Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much) - Type: - float, defaults to- 0.0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"]= airlines["Year"].asfactor() >>> airlines["Month"]= airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_dl = H2ODeepLearningEstimator(stopping_metric="auc", ... stopping_rounds=3, ... stopping_tolerance=1e-2, ... seed=1234) >>> airlines_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_dl.auc() 
 - 
property target_ratio_comm_to_comp¶
- Target ratio of communication overhead to computation. Only for multi-node operation and train_samples_per_iteration = -2 (auto-tuning). - Type: - float, defaults to- 0.05.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"]= airlines["Year"].asfactor() >>> airlines["Month"]= airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_dl = H2ODeepLearningEstimator(target_ratio_comm_to_comp=0.05, ... seed=1234) >>> airlines_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_dl.auc() 
 - 
property train_samples_per_iteration¶
- Number of training samples (globally) per MapReduce iteration. Special values are 0: one epoch, -1: all available data (e.g., replicated training data), -2: automatic. - Type: - int, defaults to- -2.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"]= airlines["Year"].asfactor() >>> airlines["Month"]= airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_dl = H2ODeepLearningEstimator(train_samples_per_iteration=-1, ... epochs=1, ... seed=1234) >>> airlines_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_dl.auc() 
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"]= airlines["Year"].asfactor() >>> airlines["Month"]= airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_dl = H2ODeepLearningEstimator() >>> airlines_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_dl.auc() 
 - 
property tweedie_power¶
- Tweedie power for Tweedie regression, must be between 1 and 2. - Type: - float, defaults to- 1.5.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"]= airlines["Year"].asfactor() >>> airlines["Month"]= airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_dl = H2ODeepLearningEstimator(tweedie_power=1.5, ... seed=1234) >>> airlines_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_dl.auc() 
 - 
property use_all_factor_levels¶
- Use all factor levels of categorical variables. Otherwise, the first factor level is omitted (without loss of accuracy). Useful for variable importances and auto-enabled for autoencoder. - Type: - bool, defaults to- True.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"]= airlines["Year"].asfactor() >>> airlines["Month"]= airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_dl = H2ODeepLearningEstimator(use_all_factor_levels=True, ... seed=1234) >>> airlines_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_dl.mse() 
 - 
property validation_frame¶
- Id of the validation data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(standardize=True, ... seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.auc() 
 - 
property variable_importances¶
- Compute variable importances for input features (Gedeon method) - can be slow for large networks. - Type: - bool, defaults to- True.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"]= airlines["Year"].asfactor() >>> airlines["Month"]= airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_dl = H2ODeepLearningEstimator(variable_importances=True, ... seed=1234) >>> airlines_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_dl.mse() 
 - 
property weights_column¶
- Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0. - Type: - str.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_dl = H2ODeepLearningEstimator(seed=1234) >>> cars_dl.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_dl.auc() 
 
H2OGeneralizedAdditiveEstimator¶
- 
class h2o.estimators.gam.H2OGeneralizedAdditiveEstimator(model_id=None, training_frame=None, validation_frame=None, nfolds=0, seed=-1, keep_cross_validation_models=True, keep_cross_validation_predictions=False, keep_cross_validation_fold_assignment=False, fold_assignment='auto', fold_column=None, response_column=None, ignored_columns=None, ignore_const_cols=True, score_each_iteration=False, offset_column=None, weights_column=None, family='auto', tweedie_variance_power=0.0, tweedie_link_power=0.0, theta=0.0, solver='auto', alpha=None, lambda_=None, lambda_search=False, early_stopping=True, nlambdas=-1, standardize=False, missing_values_handling='mean_imputation', plug_values=None, compute_p_values=False, remove_collinear_columns=False, splines_non_negative=None, intercept=True, non_negative=False, max_iterations=-1, objective_epsilon=-1.0, beta_epsilon=0.0001, gradient_epsilon=-1.0, link='family_default', startval=None, prior=-1.0, cold_start=False, lambda_min_ratio=-1.0, beta_constraints=None, max_active_predictors=-1, interactions=None, interaction_pairs=None, obj_reg=-1.0, export_checkpoints_dir=None, stopping_rounds=0, stopping_metric='auto', stopping_tolerance=0.001, balance_classes=False, class_sampling_factors=None, max_after_balance_size=5.0, max_confusion_matrix_size=20, max_runtime_secs=0.0, num_knots=None, spline_orders=None, knot_ids=None, gam_columns=None, standardize_tp_gam_cols=False, scale_tp_penalty_mat=False, bs=None, scale=None, keep_gam_cols=False, store_knot_locations=False, auc_type='auto', gainslift_bins=-1)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- Generalized Additive Model - Fits a generalized additive model, specified by a response variable, a set of predictors, and a description of the error distribution. - A subclass of - ModelBaseis returned. The specific subclass depends on the machine learning task at hand (if it’s binomial classification, then an H2OBinomialModel is returned, if it’s regression then a H2ORegressionModel is returned). The default print-out of the models is shown, but further GAM-specific information can be queried out of the object. Upon completion of the GAM, the resulting object has coefficients, normalized coefficients, residual/null deviance, aic, and a host of model metrics including MSE, AUC (for logistic regression), degrees of freedom, and confusion matrices.- 
property Lambda¶
- [Deprecated] Use - lambda_instead
 - 
property alpha¶
- Distribution of regularization between the L1 (Lasso) and L2 (Ridge) penalties. A value of 1 for alpha represents Lasso regression, a value of 0 produces Ridge regression, and anything in between specifies the amount of mixing between the two. Default value of alpha is 0 when SOLVER = ‘L-BFGS’; 0.5 otherwise. - Type: - List[float].
 - 
property auc_type¶
- Set default multinomial AUC type. - Type: - Literal["auto", "none", "macro_ovr", "weighted_ovr", "macro_ovo", "weighted_ovo"], defaults to- "auto".
 - 
property balance_classes¶
- Balance training data class counts via over/under-sampling (for imbalanced data). - Type: - bool, defaults to- False.
 - 
property beta_constraints¶
- Beta constraints - Type: - Union[None, str, H2OFrame].
 - 
property beta_epsilon¶
- Converge if beta changes less (using L-infinity norm) than beta esilon, ONLY applies to IRLSM solver - Type: - float, defaults to- 0.0001.
 - 
property bs¶
- Basis function type for each gam predictors, 0 for cr, 1 for thin plate regression with knots, 2 for monotone I-splines, 3 for NBSplineTypeI M-splines (refer to doc here: https://github.com/h2oai/h2o-3/issues/6926). If specified, must be the same size as gam_columns - Type: - List[int].- Examples
 - >>> import h2o >>> from h2o.estimators.gam import H2OGeneralizedAdditiveEstimator >>> h2o.init() >>> h2o_data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/multinomial_10_classes_10_cols_10000_Rows_train.csv") >>> h2o_data["C11"] = h2o_data["C11"].asfactor() >>> y = "C11" >>> x = ["C9","C10"] >>> h2o_model = H2OGeneralizedAdditiveEstimator(family='multinomial', ... gam_columns=["C6","C7","C8"], ... bs=[0,1,3]) >>> h2o_model.train(x=x, y=y, training_frame=h2o_data) >>> h2o_model.coef() # note the spline type in the names of gam column coefficients 
 - 
property class_sampling_factors¶
- Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes. - Type: - List[float].
 - 
property cold_start¶
- Only applicable to multiple alpha/lambda values when calling GLM from GAM. If false, build the next model for next set of alpha/lambda values starting from the values provided by current model. If true will start GLM model from scratch. - Type: - bool, defaults to- False.
 - 
property compute_p_values¶
- Request p-values computation, p-values work only with IRLSM solver and no regularization - Type: - bool, defaults to- False.
 - 
property early_stopping¶
- Stop early when there is no more relative improvement on train or validation (if provided) - Type: - bool, defaults to- True.
 - 
property export_checkpoints_dir¶
- Automatically export generated models to this directory. - Type: - str.
 - 
property family¶
- Family. Use binomial for classification with logistic regression, others are for regression problems. - Type: - Literal["auto", "gaussian", "binomial", "quasibinomial", "ordinal", "multinomial", "poisson", "gamma", "tweedie", "negativebinomial", "fractionalbinomial"], defaults to- "auto".
 - 
property fold_assignment¶
- Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems. - Type: - Literal["auto", "random", "modulo", "stratified"], defaults to- "auto".
 - 
property fold_column¶
- Column with cross-validation fold index assignment per observation. - Type: - str.
 - 
property gainslift_bins¶
- Gains/Lift table number of bins. 0 means disabled.. Default value -1 means automatic binning. - Type: - int, defaults to- -1.
 - 
property gam_columns¶
- Arrays of predictor column names for gam for smoothers using single or multiple predictors like {{‘c1’},{‘c2’,’c3’},{‘c4’},…} - Type: - List[List[str]].- Examples
 - >>> import h2o >>> from h2o.estimators.gam import H2OGeneralizedAdditiveEstimator >>> h2o.init() >>> h2o_data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/multinomial_10_classes_10_cols_10000_Rows_train.csv") >>> h2o_data["C11"] = h2o_data["C11"].asfactor() >>> y = "C11" >>> x = ["C9","C10"] >>> h2o_model = H2OGeneralizedAdditiveEstimator(family='multinomial', ... gam_columns=["C6","C7","C8"]) >>> h2o_model.train(x=x, y=y, training_frame=h2o_data) >>> h2o_model.coef() 
 - 
get_gam_knot_column_names()[source]¶
- Retrieve gam column names corresponding to the knot locations that will be returned if store_knot_locations parameter is enabled. - Returns
- gam column names whose knot locations are stored in the knot_locations. 
 
 - 
get_knot_locations(gam_column=None)[source]¶
- Retrieve gam columns knot locations if store_knot_locations parameter is enabled. If a gam column name is specified, the know loations corresponding to that gam column is returned. Otherwise, all knot locations are returned for all gam columns. The order of the gam columns are specified in gam_knot_column_names of the model output. - Returns
- knot locations of gam columns. 
 
 - 
property gradient_epsilon¶
- Converge if objective changes less (using L-infinity norm) than this, ONLY applies to L-BFGS solver. Default indicates: If lambda_search is set to False and lambda is equal to zero, the default value of gradient_epsilon is equal to .000001, otherwise the default value is .0001. If lambda_search is set to True, the conditional values above are 1E-8 and 1E-6 respectively. - Type: - float, defaults to- -1.0.
 - 
property ignore_const_cols¶
- Ignore constant columns. - Type: - bool, defaults to- True.
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property interaction_pairs¶
- A list of pairwise (first order) column interactions. - Type: - List[tuple].
 - 
property interactions¶
- A list of predictor column indices to interact. All pairwise combinations will be computed for the list. - Type: - List[str].
 - 
property intercept¶
- Include constant term in the model - Type: - bool, defaults to- True.
 - 
property keep_cross_validation_fold_assignment¶
- Whether to keep the cross-validation fold assignment. - Type: - bool, defaults to- False.
 - 
property keep_cross_validation_models¶
- Whether to keep the cross-validation models. - Type: - bool, defaults to- True.
 - 
property keep_cross_validation_predictions¶
- Whether to keep the predictions of the cross-validation models. - Type: - bool, defaults to- False.
 - 
property keep_gam_cols¶
- Save keys of model matrix - Type: - bool, defaults to- False.- Examples
 - >>> import h2o >>> from h2o.estimators.gam import H2OGeneralizedAdditiveEstimator >>> h2o.init() >>> h2o_data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/multinomial_10_classes_10_cols_10000_Rows_train.csv") >>> h2o_data["C11"] = h2o_data["C11"].asfactor() >>> train, test = h2o_data.split_frame(ratios = [.8]) >>> y = "C11" >>> x = ["C9","C10"] >>> h2o_model = H2OGeneralizedAdditiveEstimator(family='multinomial', ... keep_gam_cols=True, ... gam_columns=["C6","C7","C8"]) >>> h2o_model.train(x=x, y=y, training_frame=h2o_data) >>> h2o.get_frame(h2o_model._model_json["output"] ["gam_transformed_center_key"]) 
 - 
property knot_ids¶
- Array storing frame keys of knots. One for each gam column set specified in gam_columns - Type: - List[str].- Examples
 - >>> import h2o >>> from h2o.estimators.gam import H2OGeneralizedAdditiveEstimator >>> h2o.init() >>> knots1 = [-1.99905699, -0.98143075, 0.02599159, 1.00770987, 1.99942290] >>> frameKnots1 = h2o.H2OFrame(python_obj=knots1) >>> knots2 = [-1.999821861, -1.005257990, -0.006716042, 1.002197392, 1.999073589] >>> frameKnots2 = h2o.H2OFrame(python_obj=knots2) >>> knots3 = [-1.999675688, -0.979893796, 0.007573327, 1.011437347, 1.999611676] >>> frameKnots3 = h2o.H2OFrame(python_obj=knots3) >>> h2o_data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/multinomial_10_classes_10_cols_10000_Rows_train.csv")() >>> h2o_data["C11"] = h2o_data["C11"].asfactor() >>> train, test = h2o_data.split_frame(ratios = [.8]) >>> y = "C11" >>> x = ["C9","C10"] >>> h2o_model = H2OGeneralizedAdditiveEstimator(family='multinomial', ... gam_columns=["C6","C7","C8"], ... store_knot_locations=True, ... knot_ids=[frameKnots1.key, frameKnots2.key, frameKnots3.key]) >>> h2o_model.train(x=x, y=y, training_frame=h2o_data) >>> h2o_model.get_knot_locations() 
 - 
property lambda_¶
- Regularization strength - Type: - List[float].
 - 
property lambda_min_ratio¶
- Minimum lambda used in lambda search, specified as a ratio of lambda_max (the smallest lambda that drives all coefficients to zero). Default indicates: if the number of observations is greater than the number of variables, then lambda_min_ratio is set to 0.0001; if the number of observations is less than the number of variables, then lambda_min_ratio is set to 0.01. - Type: - float, defaults to- -1.0.
 - 
property lambda_search¶
- Use lambda search starting at lambda max, given lambda is then interpreted as lambda min - Type: - bool, defaults to- False.
 - 
property link¶
- Link function. - Type: - Literal["family_default", "identity", "logit", "log", "inverse", "tweedie", "ologit"], defaults to- "family_default".
 - 
property max_active_predictors¶
- Maximum number of active predictors during computation. Use as a stopping criterion to prevent expensive model building with many predictors. Default indicates: If the IRLSM solver is used, the value of max_active_predictors is set to 5000 otherwise it is set to 100000000. - Type: - int, defaults to- -1.
 - 
property max_after_balance_size¶
- Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes. - Type: - float, defaults to- 5.0.
 - 
property max_confusion_matrix_size¶
- [Deprecated] Maximum size (# classes) for confusion matrices to be printed in the Logs - Type: - int, defaults to- 20.
 - 
property max_iterations¶
- Maximum number of iterations - Type: - int, defaults to- -1.
 - 
property max_runtime_secs¶
- Maximum allowed runtime in seconds for model training. Use 0 to disable. - Type: - float, defaults to- 0.0.
 - 
property missing_values_handling¶
- Handling of missing values. Either MeanImputation, Skip or PlugValues. - Type: - Literal["mean_imputation", "skip", "plug_values"], defaults to- "mean_imputation".
 - 
property nfolds¶
- Number of folds for K-fold cross-validation (0 to disable or >= 2). - Type: - int, defaults to- 0.
 - 
property nlambdas¶
- Number of lambdas to be used in a search. Default indicates: If alpha is zero, with lambda search set to True, the value of nlamdas is set to 30 (fewer lambdas are needed for ridge regression) otherwise it is set to 100. - Type: - int, defaults to- -1.
 - 
property non_negative¶
- Restrict coefficients (not intercept) to be non-negative - Type: - bool, defaults to- False.
 - 
property num_knots¶
- Number of knots for gam predictors. If specified, must specify one for each gam predictor. For monotone I-splines, mininum = 2, for cs spline, minimum = 3. For thin plate, minimum is size of polynomial basis + 2. - Type: - List[int].- Examples
 - >>> import h2o >>> from h2o.estimators.gam import H2OGeneralizedAdditiveEstimator >>> h2o.init() >>> h2o_data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/multinomial_10_classes_10_cols_10000_Rows_train.csv") >>> h2o_data["C11"] = h2o_data["C11"].asfactor() >>> train, test = h2o_data.split_frame(ratios = [.8]) >>> y = "C11" >>> x = ["C9","C10"] >>> h2o_model = H2OGeneralizedAdditiveEstimator(family='multinomial', ... store_knot_locations=True, ... gam_columns=["C6","C7","C8"], ... num_knots=[3,4,5]) >>> h2o_model.train(x=x, y=y, training_frame=h2o_data) >>> h2o_model.get_knot_locations() 
 - 
property obj_reg¶
- Likelihood divider in objective value computation, default is 1/nobs - Type: - float, defaults to- -1.0.
 - 
property objective_epsilon¶
- Converge if objective value changes less than this. Default indicates: If lambda_search is set to True the value of objective_epsilon is set to .0001. If the lambda_search is set to False and lambda is equal to zero, the value of objective_epsilon is set to .000001, for any other value of lambda the default value of objective_epsilon is set to .0001. - Type: - float, defaults to- -1.0.
 - 
property offset_column¶
- Offset column. This will be added to the combination of columns before applying the link function. - Type: - str.
 - 
property plug_values¶
- Plug Values (a single row frame containing values that will be used to impute missing values of the training/validation frame, use with conjunction missing_values_handling = PlugValues) - Type: - Union[None, str, H2OFrame].
 - 
property prior¶
- Prior probability for y==1. To be used only for logistic regression iff the data has been sampled and the mean of response does not reflect reality. - Type: - float, defaults to- -1.0.
 - 
property remove_collinear_columns¶
- In case of linearly dependent columns, remove some of the dependent columns - Type: - bool, defaults to- False.
 - 
property response_column¶
- Response variable column. - Type: - str.
 - 
property scale¶
- Smoothing parameter for gam predictors. If specified, must be of the same length as gam_columns - Type: - List[float].
 - 
property scale_tp_penalty_mat¶
- Scale penalty matrix for tp (thin plate) smoothers as in R - Type: - bool, defaults to- False.- Examples
 - >>> import h2o >>> from h2o.estimators.gam import H2OGeneralizedAdditiveEstimator >>> h2o.init() >>> h2o_data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/multinomial_10_classes_10_cols_10000_Rows_train.cs >>> h2o_data["C11"] = h2o_data["C11"].asfactor() >>> y = "C11" >>> x = ["C9","C10"] >>> h2o_model = H2OGeneralizedAdditiveEstimator(family='multinomial', ... scale_tp_penalty_mat=True, ... gam_columns=["C6","C7","C8"], ... bs=[1,1,1]) >>> h2o_model.train(x=x, y=y, training_frame=h2o_data) >>> h2o_model.coef() 
 - 
property score_each_iteration¶
- Whether to score during each iteration of model training. - Type: - bool, defaults to- False.
 - 
scoring_history()[source]¶
- Retrieve Model Score History. - Returns
- The score history as an H2OTwoDimTable or a Pandas DataFrame. 
 
 - 
property seed¶
- Seed for pseudo random number generator (if applicable) - Type: - int, defaults to- -1.
 - 
property solver¶
- AUTO will set the solver based on given data and the other parameters. IRLSM is fast on on problems with small number of predictors and for lambda-search with L1 penalty, L_BFGS scales better for datasets with many columns. - Type: - Literal["auto", "irlsm", "l_bfgs", "coordinate_descent_naive", "coordinate_descent", "gradient_descent_lh", "gradient_descent_sqerr"], defaults to- "auto".
 - 
property spline_orders¶
- Order of I-splines or NBSplineTypeI M-splines used for gam predictors. If specified, must be the same size as gam_columns. For I-splines, the spline_orders will be the same as the polynomials used to generate the splines. For M-splines, the polynomials used to generate the splines will be spline_order-1. Values for bs=0 or 1 will be ignored. - Type: - List[int].- Examples
 - >>> import h2o >>> from h2o.estimators.gam import H2OGeneralizedAdditiveEstimator >>> h2o.init() >>> h2o_data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/binomial_20_cols_10KRows.csv") >>> y = "C21" >>> x = ["C19","C20"] >>> numKnots = [5,5,5] >>> h2o_model = H2OGeneralizedAdditiveEstimator(family='gaussian', ... gam_columns=["C16","C17","C18"], ... bs=[2,2,2], ... spline_orders=[3,4,5]) >>> h2o_model.train(x=x, y=y, training_frame=h2o_data) >>> h2o_model.coef() 
 - 
property splines_non_negative¶
- Valid for I-spline (bs=2) only. True if the I-splines are monotonically increasing (and monotonically non- decreasing) and False if the I-splines are monotonically decreasing (and monotonically non-increasing). If specified, must be the same size as gam_columns. Values for other spline types will be ignored. Default to true. - Type: - List[bool].- Examples
 - >>> import h2o >>> from h2o.estimators.gam import H2OGeneralizedAdditiveEstimator >>> h2o.init() >>> h2o_data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/binomial_20_cols_10KRows.csv") >>> y = "C21" >>> x = ["C19","C20"] >>> numKnots = [5,5,5] >>> h2o_model = H2OGeneralizedAdditiveEstimator(family='gaussian', ... gam_columns=["C16","C17","C18"], ... bs=[2,2,2], ... splines_non_negative=[True, True, True]) >>> h2o_model.train(x=x, y=y, training_frame=h2o_data) >>> h2o_model.coef() 
 - 
property standardize¶
- Standardize numeric columns to have zero mean and unit variance - Type: - bool, defaults to- False.
 - 
property standardize_tp_gam_cols¶
- standardize tp (thin plate) predictor columns - Type: - bool, defaults to- False.- Examples
 - >>> import h2o >>> from h2o.estimators.gam import H2OGeneralizedAdditiveEstimator >>> h2o.init() >>> h2o_data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/binomial_20_cols_10KRows.csv") >>> y = "C21" >>> x = ["C19","C20"] >>> h2o_model = H2OGeneralizedAdditiveEstimator(family='gaussian', ... gam_columns=["C16","C17","C18"], ... bs=[1,1,1], ... standardize_tp_gam_cols=True) >>> h2o_model.train(x=x, y=y, training_frame=h2o_data) >>> h2o_model.coef() 
 - 
property startval¶
- double array to initialize coefficients for GAM. - Type: - List[float].
 - 
property stopping_metric¶
- Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. - Type: - Literal["auto", "deviance", "logloss", "mse", "rmse", "mae", "rmsle", "auc", "aucpr", "lift_top_group", "misclassification", "mean_per_class_error", "custom", "custom_increasing"], defaults to- "auto".
 - 
property stopping_rounds¶
- Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) - Type: - int, defaults to- 0.
 - 
property stopping_tolerance¶
- Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much) - Type: - float, defaults to- 0.001.
 - 
property store_knot_locations¶
- If set to true, will return knot locations as double[][] array for gam column names found knots_for_gam. Default to false. - Type: - bool, defaults to- False.
 - 
property theta¶
- Theta - Type: - float, defaults to- 0.0.
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].
 - 
property tweedie_link_power¶
- Tweedie link power - Type: - float, defaults to- 0.0.
 - 
property tweedie_variance_power¶
- Tweedie variance power - Type: - float, defaults to- 0.0.
 - 
property validation_frame¶
- Id of the validation data frame. - Type: - Union[None, str, H2OFrame].
 - 
property weights_column¶
- Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0. - Type: - str.
 
- 
property 
H2OGradientBoostingEstimator¶
- 
class h2o.estimators.gbm.H2OGradientBoostingEstimator(model_id=None, training_frame=None, validation_frame=None, nfolds=0, keep_cross_validation_models=True, keep_cross_validation_predictions=False, keep_cross_validation_fold_assignment=False, score_each_iteration=False, score_tree_interval=0, fold_assignment='auto', fold_column=None, response_column=None, ignored_columns=None, ignore_const_cols=True, offset_column=None, weights_column=None, balance_classes=False, class_sampling_factors=None, max_after_balance_size=5.0, max_confusion_matrix_size=20, ntrees=50, max_depth=5, min_rows=10.0, nbins=20, nbins_top_level=1024, nbins_cats=1024, r2_stopping=None, stopping_rounds=0, stopping_metric='auto', stopping_tolerance=0.001, max_runtime_secs=0.0, seed=-1, build_tree_one_node=False, learn_rate=0.1, learn_rate_annealing=1.0, distribution='auto', quantile_alpha=0.5, tweedie_power=1.5, huber_alpha=0.9, checkpoint=None, sample_rate=1.0, sample_rate_per_class=None, col_sample_rate=1.0, col_sample_rate_change_per_level=1.0, col_sample_rate_per_tree=1.0, min_split_improvement=1e-05, histogram_type='auto', max_abs_leafnode_pred=None, pred_noise_bandwidth=0.0, categorical_encoding='auto', calibrate_model=False, calibration_frame=None, calibration_method='auto', custom_metric_func=None, custom_distribution_func=None, export_checkpoints_dir=None, in_training_checkpoints_dir=None, in_training_checkpoints_tree_interval=1, monotone_constraints=None, check_constant_response=True, gainslift_bins=-1, auc_type='auto', interaction_constraints=None, auto_rebalance=True)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- Gradient Boosting Machine - Builds gradient boosted trees on a parsed data set, for regression or classification. The default distribution function will guess the model type based on the response column type. Otherwise, the response column must be an enum for “bernoulli” or “multinomial”, and numeric for all other distributions. - 
property auc_type¶
- Set default multinomial AUC type. - Type: - Literal["auto", "none", "macro_ovr", "weighted_ovr", "macro_ovo", "weighted_ovo"], defaults to- "auto".
 - 
property auto_rebalance¶
- Allow automatic rebalancing of training and validation datasets - Type: - bool, defaults to- True.
 - 
property balance_classes¶
- Balance training data class counts via over/under-sampling (for imbalanced data). - Type: - bool, defaults to- False.- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> train, valid = covtype.split_frame(ratios=[.8], seed=1234) >>> cov_gbm = H2OGradientBoostingEstimator(balance_classes=True, ... seed=1234) >>> cov_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cov_gbm.logloss(valid=True) 
 - 
property build_tree_one_node¶
- Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets. - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_gbm = H2OGradientBoostingEstimator(build_tree_one_node=True, ... seed=1234) >>> cars_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_gbm.auc(valid=True) 
 - 
property calibrate_model¶
- Use Platt Scaling (default) or Isotonic Regression to calculate calibrated class probabilities. Calibration can provide more accurate estimates of class probabilities. - Type: - bool, defaults to- False.- Examples
 - >>> ecology = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/ecology_model.csv") >>> ecology['Angaus'] = ecology['Angaus'].asfactor() >>> response = 'Angaus' >>> train, calib = ecology.split_frame(seed = 12354) >>> predictors = ecology.columns[3:13] >>> w = h2o.create_frame(binary_fraction=1, ... binary_ones_fraction=0.5, ... missing_fraction=0, ... rows=744, cols=1) >>> w.set_names(["weight"]) >>> train = train.cbind(w) >>> ecology_gbm = H2OGradientBoostingEstimator(ntrees=10, ... max_depth=5, ... min_rows=10, ... learn_rate=0.1, ... distribution="multinomial", ... weights_column="weight", ... calibrate_model=True, ... calibration_frame=calib) >>> ecology_gbm.train(x=predictors, ... y="Angaus", ... training_frame=train) >>> ecology_gbm.auc() 
 - 
property calibration_frame¶
- Data for model calibration - Type: - Union[None, str, H2OFrame].- Examples
 - >>> ecology = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/ecology_model.csv") >>> ecology['Angaus'] = ecology['Angaus'].asfactor() >>> response = 'Angaus' >>> predictors = ecology.columns[3:13] >>> train, calib = ecology.split_frame(seed=12354) >>> w = h2o.create_frame(binary_fraction=1, ... binary_ones_fraction=0.5, ... missing_fraction=0, ... rows=744,cols=1) >>> w.set_names(["weight"]) >>> train = train.cbind(w) >>> ecology_gbm = H2OGradientBoostingEstimator(ntrees=10, ... max_depth=5, ... min_rows=10, ... learn_rate=0.1, ... distribution="multinomial", ... calibrate_model=True, ... calibration_frame=calib) >>> ecology_gbm.train(x=predictors, ... y="Angaus", ... training_frame=train, ... weights_column="weight") >>> ecology_gbm.auc() 
 - 
property calibration_method¶
- Calibration method to use - Type: - Literal["auto", "platt_scaling", "isotonic_regression"], defaults to- "auto".
 - 
property categorical_encoding¶
- Encoding scheme for categorical features - Type: - Literal["auto", "enum", "one_hot_internal", "one_hot_explicit", "binary", "eigen", "label_encoder", "sort_by_response", "enum_limited"], defaults to- "auto".- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid = airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_gbm = H2OGradientBoostingEstimator(categorical_encoding="labelencoder", ... seed=1234) >>> airlines_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_gbm.auc(valid=True) 
 - 
property check_constant_response¶
- Check if response column is constant. If enabled, then an exception is thrown if the response column is a constant value.If disabled, then model will train regardless of the response column being a constant value or not. - Type: - bool, defaults to- True.- Examples
 - >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> train["constantCol"] = 1 >>> my_gbm = H2OGradientBoostingEstimator(check_constant_response=False) >>> my_gbm.train(x=list(range(1,5)), ... y="constantCol", ... training_frame=train) 
 - 
property checkpoint¶
- Model checkpoint to resume training with. - Type: - Union[None, str, H2OEstimator].- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_gbm = H2OGradientBoostingEstimator(ntrees=1, ... seed=1234) >>> cars_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(cars_gbm.auc(valid=True)) >>> print("Number of trees built for cars_gbm model:", cars_gbm.ntrees) >>> cars_gbm_continued = H2OGradientBoostingEstimator(checkpoint=cars_gbm.model_id, ... ntrees=50, ... seed=1234) >>> cars_gbm_continued.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_gbm_continued.auc(valid=True) >>> print("Number of trees built for cars_gbm model:",cars_gbm_continued.ntrees) 
 - 
property class_sampling_factors¶
- Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes. - Type: - List[float].- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> train, valid = covtype.split_frame(ratios=[.8], seed=1234) >>> sample_factors = [1., 0.5, 1., 1., 1., 1., 1.] >>> cov_gbm = H2OGradientBoostingEstimator(balance_classes=True, ... class_sampling_factors=sample_factors, ... seed=1234) >>> cov_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cov_gbm.logloss(valid=True) 
 - 
property col_sample_rate¶
- Column sample rate (from 0.0 to 1.0) - Type: - float, defaults to- 1.0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid = airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_gbm = H2OGradientBoostingEstimator(col_sample_rate=.7, ... seed=1234) >>> airlines_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_gbm.auc(valid=True) 
 - 
property col_sample_rate_change_per_level¶
- Relative change of the column sampling rate for every level (must be > 0.0 and <= 2.0) - Type: - float, defaults to- 1.0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid = airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_gbm = H2OGradientBoostingEstimator(col_sample_rate_change_per_level=.9, ... seed=1234) >>> airlines_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_gbm.auc(valid=True) 
 - 
property col_sample_rate_per_tree¶
- Column sample rate per tree (from 0.0 to 1.0) - Type: - float, defaults to- 1.0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid = airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_gbm = H2OGradientBoostingEstimator(col_sample_rate_per_tree=.7, ... seed=1234) >>> airlines_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_gbm.auc(valid=True) 
 - 
property custom_distribution_func¶
- Reference to custom distribution, format: language:keyName=funcName - Type: - str.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid = airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_gbm = H2OGradientBoostingEstimator(ntrees=3, ... max_depth=5, ... distribution="bernoulli", ... seed=1234) >>> airlines_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame valid) >>> from h2o.utils.distributions import CustomDistributionBernoulli >>> custom_distribution_bernoulli = h2o.upload_custom_distribution(CustomDistributionBernoulli, ... func_name="custom_bernoulli", ... func_file="custom_bernoulli.py") >>> airlines_gbm_custom = H2OGradientBoostingEstimator(ntrees=3, ... max_depth=5, ... distribution="custom", ... custom_distribution_func=custom_distribution_bernoulli, ... seed=1235) >>> airlines_gbm_custom.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_gbm.auc() 
 - 
property custom_metric_func¶
- Reference to custom evaluation function, format: language:keyName=funcName - Type: - str.
 - 
property distribution¶
- Distribution function - Type: - Literal["auto", "bernoulli", "quasibinomial", "multinomial", "gaussian", "poisson", "gamma", "tweedie", "laplace", "quantile", "huber", "custom"], defaults to- "auto".- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> response = "cylinders" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_gbm = H2OGradientBoostingEstimator(distribution="poisson", ... seed=1234) >>> cars_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_gbm.mse(valid=True) 
 - 
property export_checkpoints_dir¶
- Automatically export generated models to this directory. - Type: - str.- Examples
 - >>> airlines = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip", destination_frame="air.hex") >>> predictors = ["DayofMonth", "DayOfWeek"] >>> response = "IsDepDelayed" >>> hyper_parameters = {'ntrees': [5,10]} >>> search_crit = {'strategy': "RandomDiscrete", ... 'max_models': 5, ... 'seed': 1234, ... 'stopping_rounds': 3, ... 'stopping_metric': "AUTO", ... 'stopping_tolerance': 1e-2} >>> checkpoints_dir = tempfile.mkdtemp() >>> air_grid = H2OGridSearch(H2OGradientBoostingEstimator, ... hyper_params=hyper_parameters, ... search_criteria=search_crit) >>> air_grid.train(x=predictors, ... y=response, ... training_frame=airlines, ... distribution="bernoulli", ... learn_rate=0.1, ... max_depth=3, ... export_checkpoints_dir=checkpoints_dir) >>> len(listdir(checkpoints_dir)) 
 - 
property fold_assignment¶
- Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems. - Type: - Literal["auto", "random", "modulo", "stratified"], defaults to- "auto".- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> assignment_type = "Random" >>> cars_gbm = H2OGradientBoostingEstimator(fold_assignment=assignment_type, ... nfolds=5, ... seed=1234) >>> cars_gbm.train(x=predictors, y=response, training_frame=cars) >>> cars_gbm.auc(xval=True) 
 - 
property fold_column¶
- Column with cross-validation fold index assignment per observation. - Type: - str.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> fold_numbers = cars.kfold_column(n_folds=5, ... seed=1234) >>> fold_numbers.set_names(["fold_numbers"]) >>> cars = cars.cbind(fold_numbers) >>> cars_gbm = H2OGradientBoostingEstimator(seed=1234) >>> cars_gbm.train(x=predictors, ... y=response, ... training_frame=cars, ... fold_column="fold_numbers") >>> cars_gbm.auc(xval=True) 
 - 
property gainslift_bins¶
- Gains/Lift table number of bins. 0 means disabled.. Default value -1 means automatic binning. - Type: - int, defaults to- -1.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/airlines_train.csv") >>> model = H2OGradientBoostingEstimator(ntrees=1, gainslift_bins=20) >>> model.train(x=["Origin", "Distance"], ... y="IsDepDelayed", ... training_frame=airlines) >>> model.gains_lift() 
 - 
property histogram_type¶
- What type of histogram to use for finding optimal split points - Type: - Literal["auto", "uniform_adaptive", "random", "quantiles_global", "round_robin", "uniform_robust"], defaults to- "auto".- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid = airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_gbm = H2OGradientBoostingEstimator(histogram_type="UniformAdaptive", ... seed=1234) >>> airlines_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_gbm.auc(valid=True) 
 - 
property huber_alpha¶
- Desired quantile for Huber/M-regression (threshold between quadratic and linear loss, must be between 0 and 1). - Type: - float, defaults to- 0.9.- Examples
 - >>> insurance = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/insurance.csv") >>> predictors = insurance.columns[0:4] >>> response = 'Claims' >>> insurance['Group'] = insurance['Group'].asfactor() >>> insurance['Age'] = insurance['Age'].asfactor() >>> train, valid = insurance.split_frame(ratios=[.8], seed=1234) >>> insurance_gbm = H2OGradientBoostingEstimator(distribution="huber", ... huber_alpha=0.9, ... seed=1234) >>> insurance_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> insurance_gbm.mse(valid=True) 
 - 
property ignore_const_cols¶
- Ignore constant columns. - Type: - bool, defaults to- True.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> cars["const_1"] = 6 >>> cars["const_2"] = 7 >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed=1234, ... ignore_const_cols=True) >>> cars_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_gbm.auc(valid=True) 
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property in_training_checkpoints_dir¶
- Create checkpoints into defined directory while training process is still running. In case of cluster shutdown, this checkpoint can be used to restart training. - Type: - str.
 - 
property in_training_checkpoints_tree_interval¶
- Checkpoint the model after every so many trees. Parameter is used only when in_training_checkpoints_dir is defined - Type: - int, defaults to- 1.
 - 
property interaction_constraints¶
- A set of allowed column interactions. - Type: - List[List[str]].
 - 
property keep_cross_validation_fold_assignment¶
- Whether to keep the cross-validation fold assignment. - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> folds = 5 >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_gbm = H2OGradientBoostingEstimator(keep_cross_validation_fold_assignment=True, ... nfolds=5, ... seed=1234) >>> cars_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_gbm.auc() 
 - 
property keep_cross_validation_models¶
- Whether to keep the cross-validation models. - Type: - bool, defaults to- True.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> folds = 5 >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_gbm = H2OGradientBoostingEstimator(keep_cross_validation_models=True, ... nfolds=5, ... seed=1234) >>> cars_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_gbm.auc() 
 - 
property keep_cross_validation_predictions¶
- Whether to keep the predictions of the cross-validation models. - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> folds = 5 >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_gbm = H2OGradientBoostingEstimator(keep_cross_validation_predictions=True, ... nfolds=5, ... seed=1234) >>> cars_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_gbm.auc() 
 - 
property learn_rate¶
- Learning rate (from 0.0 to 1.0) - Type: - float, defaults to- 0.1.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> del predictors[1:3] >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], seed=1234) >>> titanic_gbm = H2OGradientBoostingEstimator(ntrees=10000, ... learn_rate=0.01, ... stopping_rounds=5, ... stopping_metric="AUC", ... stopping_tolerance=1e-4, ... seed=1234) >>> titanic_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> titanic_gbm.auc(valid=True) 
 - 
property learn_rate_annealing¶
- Scale the learning rate by this factor after each tree (e.g., 0.99 or 0.999) - Type: - float, defaults to- 1.0.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> del predictors[1:3] >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], seed=1234) >>> titanic_gbm = H2OGradientBoostingEstimator(ntrees=10000, ... learn_rate=0.05, ... learn_rate_annealing=.9, ... stopping_rounds=5, ... stopping_metric="AUC", ... stopping_tolerance=1e-4, ... seed=1234) >>> titanic_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> titanic_gbm.auc(valid=True) 
 - 
property max_abs_leafnode_pred¶
- Maximum absolute value of a leaf node prediction - Type: - float, defaults to- ∞.- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> train, valid = covtype.split_frame(ratios=[.8], seed=1234) >>> cov_gbm = H2OGradientBoostingEstimator(max_abs_leafnode_pred=2, ... seed=1234) >>> cov_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cov_gbm.logloss(valid=True) 
 - 
property max_after_balance_size¶
- Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes. - Type: - float, defaults to- 5.0.- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> train, valid = covtype.split_frame(ratios=[.8], seed=1234) >>> max = .85 >>> cov_gbm = H2OGradientBoostingEstimator(balance_classes=True, ... max_after_balance_size=max, ... seed=1234) >>> cov_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cov_gbm.logloss(valid=True) 
 - 
property max_confusion_matrix_size¶
- [Deprecated] Maximum size (# classes) for confusion matrices to be printed in the Logs - Type: - int, defaults to- 20.
 - 
property max_depth¶
- Maximum tree depth (0 for unlimited). - Type: - int, defaults to- 5.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_gbm = H2OGradientBoostingEstimator(ntrees=100, ... max_depth=2, ... seed=1234) >>> cars_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_gbm.auc(valid=True) 
 - 
property max_runtime_secs¶
- Maximum allowed runtime in seconds for model training. Use 0 to disable. - Type: - float, defaults to- 0.0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_gbm = H2OGradientBoostingEstimator(max_runtime_secs=10, ... ntrees=10000, ... max_depth=10, ... seed=1234) >>> cars_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_gbm.auc(valid=True) 
 - 
property min_rows¶
- Fewest allowed (weighted) observations in a leaf. - Type: - float, defaults to- 10.0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_gbm = H2OGradientBoostingEstimator(min_rows=16, ... seed=1234) >>> cars_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_gbm.auc(valid=True) 
 - 
property min_split_improvement¶
- Minimum relative improvement in squared error reduction for a split to happen - Type: - float, defaults to- 1e-05.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_gbm = H2OGradientBoostingEstimator(min_split_improvement=1e-3, ... seed=1234) >>> cars_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_gbm.auc(valid=True) 
 - 
property monotone_constraints¶
- A mapping representing monotonic constraints. Use +1 to enforce an increasing constraint and -1 to specify a decreasing constraint. - Type: - dict.- Examples
 - >>> prostate_hex = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip") >>> prostate_hex["CAPSULE"] = prostate_hex["CAPSULE"].asfactor() >>> response = "CAPSULE" >>> seed = 42 >>> monotone_constraints = {"AGE":1} >>> gbm_model = H2OGradientBoostingEstimator(seed=seed, ... monotone_constraints=monotone_constraints) >>> gbm_model.train(y=response, ... ignored_columns=["ID"], ... training_frame=prostate_hex) >>> gbm_model.scoring_history() 
 - 
property nbins¶
- For numerical columns (real/int), build a histogram of (at least) this many bins, then split at the best point - Type: - int, defaults to- 20.- Examples
 - >>> eeg = h2o.import_file("https://h2o-public-test-data.s3.amazonaws.com/smalldata/eeg/eeg_eyestate.csv") >>> eeg['eyeDetection'] = eeg['eyeDetection'].asfactor() >>> predictors = eeg.columns[:-1] >>> response = 'eyeDetection' >>> train, valid = eeg.split_frame(ratios=[.8], seed=1234) >>> bin_num = [16, 32, 64, 128, 256, 512] >>> label = ["16", "32", "64", "128", "256", "512"] >>> for key, num in enumerate(bin_num): ... eeg_gbm = H2OGradientBoostingEstimator(nbins=num, seed=1234) ... eeg_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) ... print(label[key], 'training score', eeg_gbm.auc(train=True)) ... print(label[key], 'validation score', eeg_gbm.auc(valid=True)) 
 - 
property nbins_cats¶
- For categorical columns (factors), build a histogram of this many bins, then split at the best point. Higher values can lead to more overfitting. - Type: - int, defaults to- 1024.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid = airlines.split_frame(ratios=[.8], seed=1234) >>> bin_num = [8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096] >>> label = ["8", "16", "32", "64", "128", "256", "512", "1024", "2048", "4096"] >>> for key, num in enumerate(bin_num): ... airlines_gbm = H2OGradientBoostingEstimator(nbins_cats=num, seed=1234) ... airlines_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) ... print(label[key], 'training score', airlines_gbm.auc(train=True)) ... print(label[key], 'validation score', airlines_gbm.auc(valid=True)) 
 - 
property nbins_top_level¶
- For numerical columns (real/int), build a histogram of (at most) this many bins at the root level, then decrease by factor of two per level - Type: - int, defaults to- 1024.- Examples
 - >>> eeg = h2o.import_file("https://h2o-public-test-data.s3.amazonaws.com/smalldata/eeg/eeg_eyestate.csv") >>> eeg['eyeDetection'] = eeg['eyeDetection'].asfactor() >>> predictors = eeg.columns[:-1] >>> response = 'eyeDetection' >>> train, valid = eeg.split_frame(ratios=[.8], seed=1234) >>> bin_num = [32, 64, 128, 256, 512, 1024, 2048, 4096] >>> label = ["32", "64", "128", "256", "512", "1024", "2048", "4096"] >>> for key, num in enumerate(bin_num): ... eeg_gbm = H2OGradientBoostingEstimator(nbins_top_level=num, seed=1234) ... eeg_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) ... print(label[key], 'training score', eeg_gbm.auc(train=True)) ... print(label[key], 'validation score', eeg_gbm.auc(valid=True)) 
 - 
property nfolds¶
- Number of folds for K-fold cross-validation (0 to disable or >= 2). - Type: - int, defaults to- 0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> folds = 5 >>> cars_gbm = H2OGradientBoostingEstimator(nfolds=folds, ... seed=1234 >>> cars_gbm.train(x=predictors, ... y=response, ... training_frame=cars) >>> cars_gbm.auc() 
 - 
property ntrees¶
- Number of trees. - Type: - int, defaults to- 50.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> del predictors[1:3] >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], seed=1234) >>> tree_num = [20, 50, 80, 110, 140, 170, 200] >>> label = ["20", "50", "80", "110", "140", "170", "200"] >>> for key, num in enumerate(tree_num): ... titanic_gbm = H2OGradientBoostingEstimator(ntrees=num, ... seed=1234) ... titanic_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) ... print(label[key], 'training score', titanic_gbm.auc(train=True)) ... print(label[key], 'validation score', titanic_gbm.auc(valid=True)) 
 - 
property offset_column¶
- Offset column. This will be added to the combination of columns before applying the link function. - Type: - str.- Examples
 - >>> boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") >>> predictors = boston.columns[:-1] >>> response = "medv" >>> boston['chas'] = boston['chas'].asfactor() >>> boston["offset"] = boston["medv"].log() >>> train, valid = boston.split_frame(ratios=[.8], seed=1234) >>> boston_gbm = H2OGradientBoostingEstimator(offset_column="offset", ... seed=1234) >>> boston_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> boston_gbm.mse(valid=True) 
 - 
property pred_noise_bandwidth¶
- Bandwidth (sigma) of Gaussian multiplicative noise ~N(1,sigma) for tree node predictions - Type: - float, defaults to- 0.0.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> del predictors[1:3] >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], seed=1234) >>> titanic_gbm = H2OGradientBoostingEstimator(pred_noise_bandwidth=0.1, ... seed=1234) >>> titanic_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> titanic_gbm.auc(valid = True) 
 - 
property quantile_alpha¶
- Desired quantile for Quantile regression, must be between 0 and 1. - Type: - float, defaults to- 0.5.- Examples
 - >>> boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") >>> predictors = boston.columns[:-1] >>> response = "medv" >>> boston['chas'] = boston['chas'].asfactor() >>> train, valid = boston.split_frame(ratios=[.8], seed=1234) >>> boston_gbm = H2OGradientBoostingEstimator(distribution="quantile", ... quantile_alpha=.8, ... seed=1234) >>> boston_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> boston_gbm.mse(valid=True) 
 - 
property r2_stopping¶
- r2_stopping is no longer supported and will be ignored if set - please use stopping_rounds, stopping_metric and stopping_tolerance instead. Previous version of H2O would stop making trees when the R^2 metric equals or exceeds this - Type: - float, defaults to- ∞.
 - 
property response_column¶
- Response variable column. - Type: - str.
 - 
property sample_rate¶
- Row sample rate per tree (from 0.0 to 1.0) - Type: - float, defaults to- 1.0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["Year"]= airlines["Year"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid = airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_gbm = H2OGradientBoostingEstimator(sample_rate=.7, ... seed=1234) >>> airlines_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_gbm.auc(valid=True) 
 - 
property sample_rate_per_class¶
- A list of row sample rates per class (relative fraction for each class, from 0.0 to 1.0), for each tree - Type: - List[float].- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> train, valid = covtype.split_frame(ratios=[.8], seed=1234) >>> rate_per_class_list = [1, .4, 1, 1, 1, 1, 1] >>> cov_gbm = H2OGradientBoostingEstimator(sample_rate_per_class=rate_per_class_list, ... seed=1234) >>> cov_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cov_gbm.logloss(valid=True) 
 - 
property score_each_iteration¶
- Whether to score during each iteration of model training. - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], ... seed=1234) >>> cars_gbm = H2OGradientBoostingEstimator(score_each_iteration=True, ... ntrees=55, ... seed=1234) >>> cars_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_gbm.scoring_history() 
 - 
property score_tree_interval¶
- Score the model after every so many trees. Disabled if set to 0. - Type: - int, defaults to- 0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], ... seed=1234) >>> cars_gbm = H2OGradientBoostingEstimator(score_tree_interval=True, ... ntrees=55, ... seed=1234) >>> cars_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_gbm.scoring_history() 
 - 
property seed¶
- Seed for pseudo random number generator (if applicable) - Type: - int, defaults to- -1.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid = airlines.split_frame(ratios=[.8], seed=1234) >>> gbm_w_seed_1 = H2OGradientBoostingEstimator(col_sample_rate=.7, ... seed=1234) >>> gbm_w_seed_1.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print('auc for the 1st model built with a seed:', gbm_w_seed_1.auc(valid=True)) 
 - 
property stopping_metric¶
- Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. - Type: - Literal["auto", "deviance", "logloss", "mse", "rmse", "mae", "rmsle", "auc", "aucpr", "lift_top_group", "misclassification", "mean_per_class_error", "custom", "custom_increasing"], defaults to- "auto".- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid = airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_gbm = H2OGradientBoostingEstimator(stopping_metric="auc", ... stopping_rounds=3, ... stopping_tolerance=1e-2, ... seed=1234) >>> airlines_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_gbm.auc(valid=True) 
 - 
property stopping_rounds¶
- Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) - Type: - int, defaults to- 0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid = airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_gbm = H2OGradientBoostingEstimator(stopping_metric="auc", ... stopping_rounds=3, ... stopping_tolerance=1e-2, ... seed=1234) >>> airlines_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_gbm.auc(valid=True) 
 - 
property stopping_tolerance¶
- Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much) - Type: - float, defaults to- 0.001.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_gbm = H2OGradientBoostingEstimator(stopping_metric="auc", ... stopping_rounds=3, ... stopping_tolerance=1e-2, ... seed=1234) >>> airlines_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_gbm.auc(valid=True) 
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed=1234) >>> cars_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_gbm.auc(valid=True) 
 - 
property tweedie_power¶
- Tweedie power for Tweedie regression, must be between 1 and 2. - Type: - float, defaults to- 1.5.- Examples
 - >>> insurance = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/insurance.csv") >>> predictors = insurance.columns[0:4] >>> response = 'Claims' >>> insurance['Group'] = insurance['Group'].asfactor() >>> insurance['Age'] = insurance['Age'].asfactor() >>> train, valid = insurance.split_frame(ratios=[.8], seed=1234) >>> insurance_gbm = H2OGradientBoostingEstimator(distribution="tweedie", ... tweedie_power=1.2, ... seed=1234) >>> insurance_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> insurance_gbm.mse(valid=True) 
 - 
property validation_frame¶
- Id of the validation data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed=1234) >>> cars_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_gbm.auc(valid=True) 
 - 
property weights_column¶
- Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0. - Type: - str.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_gbm = H2OGradientBoostingEstimator(seed=1234) >>> cars_gbm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid, ... weights_column="weight") >>> cars_gbm.auc(valid=True) 
 
- 
property 
H2OGeneralizedLinearEstimator¶
- 
class h2o.estimators.glm.H2OGeneralizedLinearEstimator(model_id=None, training_frame=None, validation_frame=None, nfolds=0, checkpoint=None, export_checkpoints_dir=None, seed=-1, keep_cross_validation_models=True, keep_cross_validation_predictions=False, keep_cross_validation_fold_assignment=False, fold_assignment='auto', fold_column=None, response_column=None, ignored_columns=None, random_columns=None, ignore_const_cols=True, score_each_iteration=False, score_iteration_interval=-1, offset_column=None, weights_column=None, family='auto', rand_family=None, tweedie_variance_power=0.0, tweedie_link_power=1.0, theta=1e-10, solver='auto', alpha=None, lambda_=None, lambda_search=False, early_stopping=True, nlambdas=-1, standardize=True, missing_values_handling='mean_imputation', plug_values=None, compute_p_values=False, dispersion_parameter_method='pearson', init_dispersion_parameter=1.0, remove_collinear_columns=False, intercept=True, non_negative=False, max_iterations=-1, objective_epsilon=-1.0, beta_epsilon=0.0001, gradient_epsilon=-1.0, link='family_default', rand_link=None, startval=None, calc_like=False, HGLM=False, prior=-1.0, cold_start=False, lambda_min_ratio=-1.0, beta_constraints=None, max_active_predictors=-1, interactions=None, interaction_pairs=None, obj_reg=-1.0, stopping_rounds=0, stopping_metric='auto', stopping_tolerance=0.001, balance_classes=False, class_sampling_factors=None, max_after_balance_size=5.0, max_confusion_matrix_size=20, max_runtime_secs=0.0, custom_metric_func=None, generate_scoring_history=False, auc_type='auto', dispersion_epsilon=0.0001, tweedie_epsilon=8e-17, max_iterations_dispersion=3000, build_null_model=False, fix_dispersion_parameter=False, generate_variable_inflation_factors=False, fix_tweedie_variance_power=True, dispersion_learning_rate=0.5, influence=None, gainslift_bins=-1, linear_constraints=None, init_optimal_glm=False, separate_linear_beta=False, constraint_eta0=0.1258925, constraint_tau=10.0, constraint_alpha=0.1, constraint_beta=0.9, constraint_c0=10.0)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- Generalized Linear Modeling - Fits a generalized linear model, specified by a response variable, a set of predictors, and a description of the error distribution. - A subclass of - ModelBaseis returned. The specific subclass depends on the machine learning task at hand (if it’s binomial classification, then an H2OBinomialModel is returned, if it’s regression then a H2ORegressionModel is returned). The default print-out of the models is shown, but further GLM-specific information can be queried out of the object. Upon completion of the GLM, the resulting object has coefficients, normalized coefficients, residual/null deviance, aic, and a host of model metrics including MSE, AUC (for logistic regression), degrees of freedom, and confusion matrices.- 
property HGLM¶
- If set to true, will return HGLM model. Otherwise, normal GLM model will be returned. - Type: - bool, defaults to- False.
 - 
property Lambda¶
- [Deprecated] Use - lambda_instead
 - 
static allConstraintsPassed(model)[source]¶
- Given a constrainted GLM model, this will return true if all beta (if exists) and linear constraints are
- satified. It will return false even if one constraint is not satisfied. To see which ones failed, use getConstraintsInfo function. 
 - Parameters
- model – GLM model with linear and beta (if applicable) constraints 
- Returns
- boolean True or False 
- Example
 - >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/binomial_20_cols_10KRows.csv") >>> response = "C21" >>> predictors = list(range(0,20)) >>> loose_init_const = [] # this constraint is satisfied by default coefficient initialization >>> # add loose constraints >>> name = "C19" >>> values = 0.5 >>> types = "LessThanEqual" >>> contraint_numbers = 0 >>> loose_init_const.append([name, values, types, contraint_numbers]) >>> name = "C20" >>> values = -0.8 >>> types = "LessThanEqual" >>> contraint_numbers = 0 >>> loose_init_const.append([name, values, types, contraint_numbers]) >>> name = "constant" >>> values = -1000 >>> types = "LessThanEqual" >>> contraint_numbers = 0 >>> loose_init_const.append([name, values, types, contraint_numbers]) >>> linear_constraints2 = h2o.H2OFrame(loose_init_const) >>> linear_constraints2.set_names(["names", "values", "types", "constraint_numbers"]) >>> # GLM model with GLM coefficients with default initialization >>> h2o_glm = H2OGeneralizedLinearEstimator(family="binomial", compute_p_values=True, remove_collinear_columns=True, ... lambda_=0.0, solver="irlsm", linear_constraints=linear_constraints2, ... init_optimal_glm = False, seed=12345) >>> h2o_glm.train(x=predictors, y=response, training_frame=train) >>> print(H2OGeneralizedLinearEstimator.allConstraintsPassed(h2o_glm)) 
 - 
property alpha¶
- Distribution of regularization between the L1 (Lasso) and L2 (Ridge) penalties. A value of 1 for alpha represents Lasso regression, a value of 0 produces Ridge regression, and anything in between specifies the amount of mixing between the two. Default value of alpha is 0 when SOLVER = ‘L-BFGS’; 0.5 otherwise. - Type: - List[float].- Examples
 - >>> boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") >>> predictors = boston.columns[:-1] >>> response = "medv" >>> boston['chas'] = boston['chas'].asfactor() >>> train, valid = boston.split_frame(ratios=[.8]) >>> boston_glm = H2OGeneralizedLinearEstimator(alpha=.25) >>> boston_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(boston_glm.mse(valid=True)) 
 - 
property auc_type¶
- Set default multinomial AUC type. - Type: - Literal["auto", "none", "macro_ovr", "weighted_ovr", "macro_ovo", "weighted_ovo"], defaults to- "auto".
 - 
property balance_classes¶
- Balance training data class counts via over/under-sampling (for imbalanced data). - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","year"] >>> response = "acceleration" >>> train, valid = cars.split_frame(ratios=[.8]) >>> cars_glm = H2OGeneralizedLinearEstimator(balance_classes=True, ... seed=1234) >>> cars_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_glm.mse() 
 - 
property beta_constraints¶
- Beta constraints - Type: - Union[None, str, H2OFrame].- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","year"] >>> response = "acceleration" >>> train, valid = cars.split_frame(ratios=[.8]) >>> n = len(predictors) >>> constraints = h2o.H2OFrame({'names':predictors, ... 'lower_bounds': [-1000]*n, ... 'upper_bounds': [1000]*n, ... 'beta_given': [1]*n, ... 'rho': [0.2]*n}) >>> cars_glm = H2OGeneralizedLinearEstimator(standardize=True, ... beta_constraints=constraints) >>> cars_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_glm.mse() 
 - 
property beta_epsilon¶
- Converge if beta changes less (using L-infinity norm) than beta esilon. ONLY applies to IRLSM solver. - Type: - float, defaults to- 0.0001.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","year"] >>> response = "acceleration" >>> train, valid = cars.split_frame(ratios=[.8]) >>> cars_glm = H2OGeneralizedLinearEstimator(beta_epsilon=1e-3) >>> cars_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_glm.mse() 
 - 
property build_null_model¶
- If set, will build a model with only the intercept. Default to false. - Type: - bool, defaults to- False.
 - 
property calc_like¶
- if true, will return likelihood function value. - Type: - bool, defaults to- False.
 - 
property checkpoint¶
- Model checkpoint to resume training with. - Type: - Union[None, str, H2OEstimator].
 - 
property class_sampling_factors¶
- Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes. - Type: - List[float].- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","year"] >>> response = "acceleration" >>> train, valid = cars.split_frame(ratios=[.8]) >>> sample_factors = [1., 0.5, 1., 1., 1., 1., 1.] >>> cars_glm = H2OGeneralizedLinearEstimator(balance_classes=True, ... class_sampling_factors=sample_factors, ... seed=1234) >>> cars_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_glm.mse() 
 - 
property cold_start¶
- Only applicable to multiple alpha/lambda values. If false, build the next model for next set of alpha/lambda values starting from the values provided by current model. If true will start GLM model from scratch. - Type: - bool, defaults to- False.
 - 
property compute_p_values¶
- Request p-values computation, p-values work only with IRLSM solver. - Type: - bool, defaults to- False.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8]) >>> airlines_glm = H2OGeneralizedLinearEstimator(family='binomial', ... lambda_=0, ... remove_collinear_columns=True, ... compute_p_values=True) >>> airlines_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_glm.mse() 
 - 
property constraint_alpha¶
- For constrained GLM only. It affects the setting of eta_k = eta_0/pow(c_0, alpha). - Type: - float, defaults to- 0.1.
 - 
property constraint_beta¶
- For constrained GLM only. It affects the setting of eta_k+1 = eta_k/pow(c_k, beta). - Type: - float, defaults to- 0.9.
 - 
property constraint_c0¶
- For constrained GLM only. It affects the initial setting of epsilon_k = 1/c_0. - Type: - float, defaults to- 10.0.
 - 
property constraint_eta0¶
- For constrained GLM only. It affects the setting of eta_k+1=eta_0/power(ck+1, alpha). - Type: - float, defaults to- 0.1258925.
 - 
property constraint_tau¶
- For constrained GLM only. It affects the setting of c_k+1=tau*c_k. - Type: - float, defaults to- 10.0.
 - 
property custom_metric_func¶
- Reference to custom evaluation function, format: language:keyName=funcName - Type: - str.
 - 
property dispersion_epsilon¶
- If changes in dispersion parameter estimation or loglikelihood value is smaller than dispersion_epsilon, will break out of the dispersion parameter estimation loop using maximum likelihood. - Type: - float, defaults to- 0.0001.
 - 
property dispersion_learning_rate¶
- Dispersion learning rate is only valid for tweedie family dispersion parameter estimation using ml. It must be > 0. This controls how much the dispersion parameter estimate is to be changed when the calculated loglikelihood actually decreases with the new dispersion. In this case, instead of setting new dispersion = dispersion + change, we set new dispersion = dispersion + dispersion_learning_rate * change. Defaults to 0.5. - Type: - float, defaults to- 0.5.
 - 
property dispersion_parameter_method¶
- Method used to estimate the dispersion parameter for Tweedie, Gamma and Negative Binomial only. - Type: - Literal["deviance", "pearson", "ml"], defaults to- "pearson".
 - 
property early_stopping¶
- Stop early when there is no more relative improvement on train or validation (if provided). - Type: - bool, defaults to- True.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8]) >>> cars_glm = H2OGeneralizedLinearEstimator(family='binomial', ... early_stopping=True) >>> cars_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_glm.auc(valid=True) 
 - 
property export_checkpoints_dir¶
- Automatically export generated models to this directory. - Type: - str.- Examples
 - >>> import tempfile >>> from os import listdir >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","year"] >>> response = "acceleration" >>> train, valid = cars.split_frame(ratios=[.8]) >>> checkpoints = tempfile.mkdtemp() >>> cars_glm = H2OGeneralizedLinearEstimator(export_checkpoints_dir=checkpoints, ... seed=1234) >>> cars_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_glm.mse() >>> len(listdir(checkpoints_dir)) 
 - 
property family¶
- Family. Use binomial for classification with logistic regression, others are for regression problems. - Type: - Literal["auto", "gaussian", "binomial", "fractionalbinomial", "quasibinomial", "ordinal", "multinomial", "poisson", "gamma", "tweedie", "negativebinomial"], defaults to- "auto".- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8]) >>> cars_glm = H2OGeneralizedLinearEstimator(family='binomial') >>> cars_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_glm.auc(valid = True) 
 - 
property fix_dispersion_parameter¶
- Only used for Tweedie, Gamma and Negative Binomial GLM. If set, will use the dispsersion parameter in init_dispersion_parameter as the standard error and use it to calculate the p-values. Default to false. - Type: - bool, defaults to- False.
 - 
property fix_tweedie_variance_power¶
- If true, will fix tweedie variance power value to the value set in tweedie_variance_power. - Type: - bool, defaults to- True.
 - 
property fold_assignment¶
- Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems. - Type: - Literal["auto", "random", "modulo", "stratified"], defaults to- "auto".- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> assignment_type = "Random" >>> cars_gml = H2OGeneralizedLinearEstimator(fold_assignment=assignment_type, ... nfolds=5, ... family='binomial', ... seed=1234) >>> cars_glm.train(x=predictors, ... y=response, ... training_frame=cars) >>> cars_glm.auc(train=True) 
 - 
property fold_column¶
- Column with cross-validation fold index assignment per observation. - Type: - str.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> fold_numbers = cars.kfold_column(n_folds=5, seed=1234) >>> fold_numbers.set_names(["fold_numbers"]) >>> cars = cars.cbind(fold_numbers) >>> print(cars['fold_numbers']) >>> cars_glm = H2OGeneralizedLinearEstimator(seed=1234, ... family="binomial") >>> cars_glm.train(x=predictors, ... y=response, ... training_frame=cars, ... fold_column="fold_numbers") >>> cars_glm.auc(xval=True) 
 - 
property gainslift_bins¶
- Gains/Lift table number of bins. 0 means disabled.. Default value -1 means automatic binning. - Type: - int, defaults to- -1.
 - 
property generate_scoring_history¶
- If set to true, will generate scoring history for GLM. This may significantly slow down the algo. - Type: - bool, defaults to- False.
 - 
property generate_variable_inflation_factors¶
- if true, will generate variable inflation factors for numerical predictors. Default to false. - Type: - bool, defaults to- False.- Examples
 - >>> training_data = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/glm_test/gamma_dispersion_factor_9_10kRows.csv") >>> predictors = ['abs.C1.','abs.C2.','abs.C3.','abs.C4.','abs.C5.'] >>> response = 'resp' >>> vif_glm = H2OGeneralizedLinearEstimator(family="gamma", ... lambda_=0, ... generate_variable_inflation_factors=True, ... fold_assignment="modulo", ... nfolds=3, ... keep_cross_validation_models=True) >>> vif_glm.train(x=predictors, y=response, training_frame=training_data) >>> vif_glm.get_variable_inflation_factors() 
 - 
static getAlphaBest(model)[source]¶
- Extract best alpha value found from glm model. - Parameters
- model – source lambda search model 
- Examples
 - >>> d = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv") >>> m = H2OGeneralizedLinearEstimator(family = 'binomial', ... lambda_search = True, ... solver = 'COORDINATE_DESCENT') >>> m.train(training_frame = d, ... x = [2,3,4,5,6,7,8], ... y = 1) >>> bestAlpha = H2OGeneralizedLinearEstimator.getAlphaBest(m) >>> print("Best alpha found is {0}".format(bestAlpha)) 
 - 
static getConstraintsInfo(model)[source]¶
- Given a constrained GLM model, the constraints descriptions, constraints values, constraints conditions and whether the constraints are satisfied (true) or not (false) are returned. - Parameters
- model – GLM model with linear and beta (if applicable) constraints 
- Returns
- H2OTwoDimTable containing the above constraints information. 
- Example
 - >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/binomial_20_cols_10KRows.csv") >>> response = "C21" >>> predictors = list(range(0,20)) >>> loose_init_const = [] # this constraint is satisfied by default coefficient initialization >>> # add loose constraints >>> name = "C19" >>> values = 0.5 >>> types = "LessThanEqual" >>> contraint_numbers = 0 >>> loose_init_const.append([name, values, types, contraint_numbers]) >>> name = "C20" >>> values = -0.8 >>> types = "LessThanEqual" >>> contraint_numbers = 0 >>> loose_init_const.append([name, values, types, contraint_numbers]) >>> name = "constant" >>> values = -1000 >>> types = "LessThanEqual" >>> contraint_numbers = 0 >>> loose_init_const.append([name, values, types, contraint_numbers]) >>> linear_constraints2 = h2o.H2OFrame(loose_init_const) >>> linear_constraints2.set_names(["names", "values", "types", "constraint_numbers"]) >>> # GLM model with GLM coefficients with default initialization >>> h2o_glm = H2OGeneralizedLinearEstimator(family="binomial", compute_p_values=True, remove_collinear_columns=True, ... lambda_=0.0, solver="irlsm", linear_constraints=linear_constraints2, ... init_optimal_glm = False, seed=12345) >>> h2o_glm.train(x=predictors, y=response, training_frame=train) >>> print(H2OGeneralizedLinearEstimator.getConstraintsInfo(h2o_glm)) 
 - 
static getGLMRegularizationPath(model)[source]¶
- Extract full regularization path explored during lambda search from glm model. - Parameters
- model – source lambda search model 
- Examples
 - >>> d = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv") >>> m = H2OGeneralizedLinearEstimator(family = 'binomial', ... lambda_search = True, ... solver = 'COORDINATE_DESCENT') >>> m.train(training_frame = d, ... x = [2,3,4,5,6,7,8], ... y = 1) >>> r = H2OGeneralizedLinearEstimator.getGLMRegularizationPath(m) >>> m2 = H2OGeneralizedLinearEstimator.makeGLMModel(model=m, ... coefs=r['coefficients'][10]) >>> dev1 = r['explained_deviance_train'][10] >>> p = m2.model_performance(d) >>> dev2 = 1-p.residual_deviance()/p.null_deviance() >>> print(dev1, " =?= ", dev2) 
 - 
static getLambdaBest(model)[source]¶
- Extract best lambda value found from glm model. - Parameters
- model – source lambda search model 
- Examples
 - >>> d = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv") >>> m = H2OGeneralizedLinearEstimator(family = 'binomial', ... lambda_search = True, ... solver = 'COORDINATE_DESCENT') >>> m.train(training_frame = d, ... x = [2,3,4,5,6,7,8], ... y = 1) >>> bestLambda = H2OGeneralizedLinearEstimator.getLambdaBest(m) >>> print("Best lambda found is {0}".format(bestLambda)) 
 - 
static getLambdaMax(model)[source]¶
- Extract the maximum lambda value used during lambda search. - Parameters
- model – source lambda search model 
- Examples
 - >>> d = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv") >>> m = H2OGeneralizedLinearEstimator(family = 'binomial', ... lambda_search = True, ... solver = 'COORDINATE_DESCENT') >>> m.train(training_frame = d, ... x = [2,3,4,5,6,7,8], ... y = 1) >>> maxLambda = H2OGeneralizedLinearEstimator.getLambdaMax(m) >>> print("Maximum lambda found is {0}".format(maxLambda)) 
 - 
static getLambdaMin(model)[source]¶
- Extract the minimum lambda value calculated during lambda search from glm model. Note that due to early stop, this minimum lambda value may not be used in the actual lambda search. - Parameters
- model – source lambda search model 
- Examples
 - >>> d = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv") >>> m = H2OGeneralizedLinearEstimator(family = 'binomial', ... lambda_search = True, ... solver = 'COORDINATE_DESCENT') >>> m.train(training_frame = d, ... x = [2,3,4,5,6,7,8], ... y = 1) >>> minLambda = H2OGeneralizedLinearEstimator.getLambdaMin(m) >>> print("Minimum lambda found is {0}".format(minLambda)) 
 - 
get_regression_influence_diagnostics()[source]¶
- For GLM model, if influence is set to dfbetas, a frame containing the original predictors, response and DFBETA_ for each predictors that are used in building the model is returned. - Returns
- H2OFrame containing predictors used in building the model, response and DFBETA_ for each predictor. 
- Examples
 - >>> d = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv") >>> m = H2OGeneralizedLinearEstimator(family = 'binomial', ... lambda_=0.0, ... standardize=False, ... influence="dfbetas") >>> m.train(training_frame = d, ... x = [2,3,4,5,6,7,8], ... y = 1) >>> ridFrame = m.get_regression_influence_diagnostics() >>> print("column names of regression influence diagnostics frame is {0}".format(ridFrame.names)) 
 - 
property gradient_epsilon¶
- Converge if objective changes less (using L-infinity norm) than this, ONLY applies to L-BFGS solver. Default (of -1.0) indicates: If lambda_search is set to False and lambda is equal to zero, the default value of gradient_epsilon is equal to .000001, otherwise the default value is .0001. If lambda_search is set to True, the conditional values above are 1E-8 and 1E-6 respectively. - Type: - float, defaults to- -1.0.- Examples
 - >>> boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") >>> predictors = boston.columns[:-1] >>> response = "medv" >>> boston['chas'] = boston['chas'].asfactor() >>> train, valid = boston.split_frame(ratios=[.8]) >>> boston_glm = H2OGeneralizedLinearEstimator(gradient_epsilon=1e-3) >>> boston_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> boston_glm.mse() 
 - 
property ignore_const_cols¶
- Ignore constant columns. - Type: - bool, defaults to- True.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> cars["const_1"] = 6 >>> cars["const_2"] = 7 >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_glm = H2OGeneralizedLinearEstimator(seed=1234, ... ignore_const_cols=True, ... family="binomial") >>> cars_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_glm.auc(valid=True) 
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property influence¶
- If set to dfbetas will calculate the difference in beta when a datarow is included and excluded in the dataset. - Type: - Literal["dfbetas"].
 - 
property init_dispersion_parameter¶
- Only used for Tweedie, Gamma and Negative Binomial GLM. Store the initial value of dispersion parameter. If fix_dispersion_parameter is set, this value will be used in the calculation of p-values. - Type: - float, defaults to- 1.0.
 - 
property init_optimal_glm¶
- If true, will initialize coefficients with values derived from GLM runs without linear constraints. Only available for linear constraints. - Type: - bool, defaults to- False.
 - 
property interaction_pairs¶
- A list of pairwise (first order) column interactions. - Type: - List[tuple].- Examples
 - >>> df = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> XY = [df.names[i-1] for i in [1,2,3,4,6,8,9,13,17,18,19,31]] >>> interactions = [XY[i-1] for i in [5,7,9]] >>> m = H2OGeneralizedLinearEstimator(lambda_search=True, ... family="binomial", ... interactions=interactions) >>> m.train(x=XY[:len(XY)], y=XY[-1],training_frame=df) >>> m._model_json['output']['coefficients_table'] >>> coef_m = m._model_json['output']['coefficients_table'] >>> interaction_pairs = [("CRSDepTime", "UniqueCarrier"), ... ("CRSDepTime", "Origin"), ... ("UniqueCarrier", "Origin")] >>> mexp = H2OGeneralizedLinearEstimator(lambda_search=True, ... family="binomial", ... interaction_pairs=interaction_pairs) >>> mexp.train(x=XY[:len(XY)], y=XY[-1],training_frame=df) >>> mexp._model_json['output']['coefficients_table'] 
 - 
property interactions¶
- A list of predictor column indices to interact. All pairwise combinations will be computed for the list. - Type: - List[str].- Examples
 - >>> boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") >>> predictors = boston.columns[:-1] >>> response = "medv" >>> boston['chas'] = boston['chas'].asfactor() >>> train, valid = boston.split_frame(ratios=[.8]) >>> interactions_list = ['crim', 'dis'] >>> boston_glm = H2OGeneralizedLinearEstimator(interactions=interactions_list) >>> boston_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> boston_glm.mse() 
 - 
property intercept¶
- Include constant term in the model - Type: - bool, defaults to- True.- Examples
 - >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") >>> iris['class'] = iris['class'].asfactor() >>> predictors = iris.columns[:-1] >>> response = 'class' >>> train, valid = iris.split_frame(ratios=[.8]) >>> iris_glm = H2OGeneralizedLinearEstimator(family='multinomial', ... intercept=True) >>> iris_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> iris_glm.logloss(valid=True) 
 - 
property keep_cross_validation_fold_assignment¶
- Whether to keep the cross-validation fold assignment. - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_glm = H2OGeneralizedLinearEstimator(keep_cross_validation_fold_assignment=True, ... nfolds=5, ... seed=1234, ... family="binomial") >>> cars_glm.train(x=predictors, ... y=response, ... training_frame=train) >>> cars_glm.cross_validation_fold_assignment() 
 - 
property keep_cross_validation_models¶
- Whether to keep the cross-validation models. - Type: - bool, defaults to- True.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_glm = H2OGeneralizedLinearEstimator(keep_cross_validation_models=True, ... nfolds=5, ... seed=1234, ... family="binomial") >>> cars_glm.train(x=predictors, ... y=response, ... training_frame=train) >>> cars_glm_cv_models = cars_glm.cross_validation_models() >>> print(cars_glm.cross_validation_models()) 
 - 
property keep_cross_validation_predictions¶
- Whether to keep the predictions of the cross-validation models. - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_glm = H2OGeneralizedLinearEstimator(keep_cross_validation_predictions=True, ... nfolds=5, ... seed=1234, ... family="binomial") >>> cars_glm.train(x=predictors, ... y=response, ... training_frame=train) >>> cars_glm.cross_validation_predictions() 
 - 
property lambda_¶
- Regularization strength - Type: - List[float].- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid = airlines.split_frame(ratios=[.8]) >>> airlines_glm = H2OGeneralizedLinearEstimator(family='binomial', ... lambda_=.0001) >>> airlines_glm.train(x=predictors, ... y=response ... trainig_frame=train, ... validation_frame=valid) >>> print(airlines_glm.auc(valid=True)) 
 - 
property lambda_min_ratio¶
- Minimum lambda used in lambda search, specified as a ratio of lambda_max (the smallest lambda that drives all coefficients to zero). Default indicates: if the number of observations is greater than the number of variables, then lambda_min_ratio is set to 0.0001; if the number of observations is less than the number of variables, then lambda_min_ratio is set to 0.01. - Type: - float, defaults to- -1.0.- Examples
 - >>> boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") >>> predictors = boston.columns[:-1] >>> response = "medv" >>> boston['chas'] = boston['chas'].asfactor() >>> train, valid = boston.split_frame(ratios=[.8]) >>> boston_glm = H2OGeneralizedLinearEstimator(lambda_min_ratio=.0001) >>> boston_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> boston_glm.mse() 
 - 
property lambda_search¶
- Use lambda search starting at lambda max, given lambda is then interpreted as lambda min. - Type: - bool, defaults to- False.- Examples
 - >>> boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") >>> predictors = boston.columns[:-1] >>> response = "medv" >>> boston['chas'] = boston['chas'].asfactor() >>> train, valid = boston.split_frame(ratios=[.8]) >>> boston_glm = H2OGeneralizedLinearEstimator(lambda_search=True) >>> boston_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(boston_glm.mse(valid=True)) 
 - 
property linear_constraints¶
- Linear constraints: used to specify linear constraints involving more than one coefficients in standard form. It is only supported for solver IRLSM. It contains four columns: names (strings for coefficient names or constant), values, types ( strings of ‘Equal’ or ‘LessThanEqual’), constraint_numbers (0 for first linear constraint, 1 for second linear constraint, …). - Type: - Union[None, str, H2OFrame].
 - 
property link¶
- Link function. - Type: - Literal["family_default", "identity", "logit", "log", "inverse", "tweedie", "ologit"], defaults to- "family_default".- Examples
 - >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") >>> iris['class'] = iris['class'].asfactor() >>> predictors = iris.columns[:-1] >>> response = 'class' >>> train, valid = iris.split_frame(ratios=[.8]) >>> iris_glm = H2OGeneralizedLinearEstimator(family='multinomial', ... link='family_default') >>> iris_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> iris_glm.logloss() 
 - 
static makeGLMModel(model, coefs, threshold=0.5)[source]¶
- Create a custom GLM model using the given coefficients. - Needs to be passed source model trained on the dataset to extract the dataset information from. - Parameters
- model – source model, used for extracting dataset information 
- coefs – dictionary containing model coefficients 
- threshold – (optional, only for binomial) decision threshold used for classification 
 
- Examples
 - >>> d = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv") >>> m = H2OGeneralizedLinearEstimator(family='binomial', ... lambda_search=True, ... solver='COORDINATE_DESCENT') >>> m.train(training_frame=d, ... x=[2,3,4,5,6,7,8], ... y=1) >>> r = H2OGeneralizedLinearEstimator.getGLMRegularizationPath(m) >>> m2 = H2OGeneralizedLinearEstimator.makeGLMModel(model=m, ... coefs=r['coefficients'][10]) >>> dev1 = r['explained_deviance_train'][10] >>> p = m2.model_performance(d) >>> dev2 = 1-p.residual_deviance()/p.null_deviance() >>> print(dev1, " =?= ", dev2) 
 - 
property max_active_predictors¶
- Maximum number of active predictors during computation. Use as a stopping criterion to prevent expensive model building with many predictors. Default indicates: If the IRLSM solver is used, the value of max_active_predictors is set to 5000 otherwise it is set to 100000000. - Type: - int, defaults to- -1.- Examples
 - >>> higgs= h2o.import_file("https://h2o-public-test-data.s3.amazonaws.com/smalldata/testng/higgs_train_5k.csv") >>> predictors = higgs.names >>> predictors.remove('response') >>> response = "response" >>> train, valid = higgs.split_frame(ratios=[.8]) >>> higgs_glm = H2OGeneralizedLinearEstimator(family='binomial', ... max_active_predictors=200) >>> higgs_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> higgs_glm.auc() 
 - 
property max_after_balance_size¶
- Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes. - Type: - float, defaults to- 5.0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","year"] >>> response = "acceleration" >>> train, valid = cars.split_frame(ratios=[.8]) >>> max = .85 >>> cars_glm = H2OGeneralizedLinearEstimator(balance_classes=True, ... max_after_balance_size=max, ... seed=1234) >>> cars_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_glm.mse() 
 - 
property max_confusion_matrix_size¶
- [Deprecated] Maximum size (# classes) for confusion matrices to be printed in the Logs. - Type: - int, defaults to- 20.
 - 
property max_iterations¶
- Maximum number of iterations. Value should >=1. A value of 0 is only set when only the model coefficient names and model coefficient dimensions are needed. - Type: - int, defaults to- -1.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8]) >>> cars_glm = H2OGeneralizedLinearEstimator(family='binomial', ... max_iterations=50) >>> cars_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_glm.mse() 
 - 
property max_iterations_dispersion¶
- Control the maximum number of iterations in the dispersion parameter estimation loop using maximum likelihood. - Type: - int, defaults to- 3000.
 - 
property max_runtime_secs¶
- Maximum allowed runtime in seconds for model training. Use 0 to disable. - Type: - float, defaults to- 0.0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8]) >>> cars_glm = H2OGeneralizedLinearEstimator(max_runtime_secs=10, ... seed=1234) >>> cars_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_glm.mse() 
 - 
property missing_values_handling¶
- Handling of missing values. Either MeanImputation, Skip or PlugValues. - Type: - Literal["mean_imputation", "skip", "plug_values"], defaults to- "mean_imputation".- Examples
 - >>> boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") >>> predictors = boston.columns[:-1] >>> response = "medv" >>> boston['chas'] = boston['chas'].asfactor() >>> boston.insert_missing_values() >>> train, valid = boston.split_frame(ratios=[.8]) >>> boston_glm = H2OGeneralizedLinearEstimator(missing_values_handling="skip") >>> boston_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> boston_glm.mse() 
 - 
property nfolds¶
- Number of folds for K-fold cross-validation (0 to disable or >= 2). - Type: - int, defaults to- 0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> folds = 5 >>> cars_glm = H2OGeneralizedLinearEstimator(nfolds=folds, ... seed=1234, ... family='binomial') >>> cars_glm.train(x=predictors, ... y=response, ... training_frame=cars) >>> cars_glm.auc(xval=True) 
 - 
property nlambdas¶
- Number of lambdas to be used in a search. Default indicates: If alpha is zero, with lambda search set to True, the value of nlamdas is set to 30 (fewer lambdas are needed for ridge regression) otherwise it is set to 100. - Type: - int, defaults to- -1.- Examples
 - >>> boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") >>> predictors = boston.columns[:-1] >>> response = "medv" >>> boston['chas'] = boston['chas'].asfactor() >>> train, valid = boston.split_frame(ratios=[.8]) >>> boston_glm = H2OGeneralizedLinearEstimator(lambda_search=True, ... nlambdas=50) >>> boston_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(boston_glm.mse(valid=True)) 
 - 
property non_negative¶
- Restrict coefficients (not intercept) to be non-negative. - Type: - bool, defaults to- False.- Examples
 - >>> airlines = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8]) >>> airlines_glm = H2OGeneralizedLinearEstimator(family='binomial', ... non_negative=True) >>> airlines_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_glm.auc() 
 - 
property obj_reg¶
- Likelihood divider in objective value computation, default (of -1.0) will set it to 1/nobs. - Type: - float, defaults to- -1.0.- Examples
 - >>> df = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/bigdata/laptop/glm_ordinal_logit/ordinal_multinomial_training_set.csv") >>> df["C11"] = df["C11"].asfactor() >>> ordinal_fit = H2OGeneralizedLinearEstimator(family="ordinal", ... alpha=1.0, ... lambda_=0.000000001, ... obj_reg=0.00001, ... max_iterations=1000, ... beta_epsilon=1e-8, ... objective_epsilon=1e-10) >>> ordinal_fit.train(x=list(range(0,10)), ... y="C11", ... training_frame=df) >>> ordinal_fit.mse() 
 - 
property objective_epsilon¶
- Converge if objective value changes less than this. Default (of -1.0) indicates: If lambda_search is set to True the value of objective_epsilon is set to .0001. If the lambda_search is set to False and lambda is equal to zero, the value of objective_epsilon is set to .000001, for any other value of lambda the default value of objective_epsilon is set to .0001. - Type: - float, defaults to- -1.0.- Examples
 - >>> boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") >>> predictors = boston.columns[:-1] >>> response = "medv" >>> boston['chas'] = boston['chas'].asfactor() >>> train, valid = boston.split_frame(ratios=[.8]) >>> boston_glm = H2OGeneralizedLinearEstimator(objective_epsilon=1e-3) >>> boston_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> boston_glm.mse() 
 - 
property offset_column¶
- Offset column. This will be added to the combination of columns before applying the link function. - Type: - str.- Examples
 - >>> boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") >>> predictors = boston.columns[:-1] >>> response = "medv" >>> boston['chas'] = boston['chas'].asfactor() >>> boston["offset"] = boston["medv"].log() >>> train, valid = boston.split_frame(ratios=[.8], seed=1234) >>> boston_glm = H2OGeneralizedLinearEstimator(offset_column="offset", ... seed=1234) >>> boston_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> boston_glm.mse(valid=True) 
 - 
property plug_values¶
- Plug Values (a single row frame containing values that will be used to impute missing values of the training/validation frame, use with conjunction missing_values_handling = PlugValues). - Type: - Union[None, str, H2OFrame].- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars = cars.drop(0) >>> means = cars.mean() >>> means = H2OFrame._expr(ExprNode("mean", cars, True, 0)) >>> glm_means = H2OGeneralizedLinearEstimator(seed=42) >>> glm_means.train(training_frame=cars, y="cylinders") >>> glm_plugs1 = H2OGeneralizedLinearEstimator(seed=42, ... missing_values_handling="PlugValues", ... plug_values=means) >>> glm_plugs1.train(training_frame=cars, y="cylinders") >>> glm_means.coef() == glm_plugs1.coef() >>> not_means = 0.1 + (means * 0.5) >>> glm_plugs2 = H2OGeneralizedLinearEstimator(seed=42, ... missing_values_handling="PlugValues", ... plug_values=not_means) >>> glm_plugs2.train(training_frame=cars, y="cylinders") >>> glm_means.coef() != glm_plugs2.coef() 
 - 
property prior¶
- Prior probability for y==1. To be used only for logistic regression iff the data has been sampled and the mean of response does not reflect reality. - Type: - float, defaults to- -1.0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8]) >>> cars_glm1 = H2OGeneralizedLinearEstimator(family='binomial', prior=0.5) >>> cars_glm1.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_glm1.mse() 
 - 
property rand_family¶
- Random Component Family array. One for each random component. Only support gaussian for now. - Type: - List[Literal["[gaussian]"]].
 - 
property rand_link¶
- Link function array for random component in HGLM. - Type: - List[Literal["[identity]", "[family_default]"]].
 - 
property random_columns¶
- random columns indices for HGLM. - Type: - List[int].
 - 
property remove_collinear_columns¶
- In case of linearly dependent columns, remove the dependent columns. - Type: - bool, defaults to- False.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid = airlines.split_frame(ratios=[.8]) >>> airlines_glm = H2OGeneralizedLinearEstimator(family='binomial', ... lambda_=0, ... remove_collinear_columns=True) >>> airlines_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_glm.auc() 
 - 
property response_column¶
- Response variable column. - Type: - str.
 - 
property score_each_iteration¶
- Whether to score during each iteration of model training. - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_glm = H2OGeneralizedLinearEstimator(score_each_iteration=True, ... seed=1234, ... family='binomial') >>> cars_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_glm.scoring_history() 
 - 
property score_iteration_interval¶
- Perform scoring for every score_iteration_interval iterations. - Type: - int, defaults to- -1.
 - 
property seed¶
- Seed for pseudo random number generator (if applicable). - Type: - int, defaults to- -1.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid = airlines.split_frame(ratios=[.8], seed=1234) >>> glm_w_seed = H2OGeneralizedLinearEstimator(family='binomial', ... seed=1234) >>> glm_w_seed.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(glm_w_seed_1.auc(valid=True)) 
 - 
property separate_linear_beta¶
- If true, will keep the beta constraints and linear constraints separate. After new coefficients are found, first beta constraints will be applied followed by the application of linear constraints. Note that the beta constraints in this case will not be part of the objective function. If false, will combine the beta and linear constraints. - Type: - bool, defaults to- False.
 - 
property solver¶
- AUTO will set the solver based on given data and the other parameters. IRLSM is fast on on problems with small number of predictors and for lambda-search with L1 penalty, L_BFGS scales better for datasets with many columns. - Type: - Literal["auto", "irlsm", "l_bfgs", "coordinate_descent_naive", "coordinate_descent", "gradient_descent_lh", "gradient_descent_sqerr"], defaults to- "auto".- Examples
 - >>> boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") >>> predictors = boston.columns[:-1] >>> response = "medv" >>> boston['chas'] = boston['chas'].asfactor() >>> train, valid = boston.split_frame(ratios=[.8]) >>> boston_glm = H2OGeneralizedLinearEstimator(solver='irlsm') >>> boston_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(boston_glm.mse(valid=True)) 
 - 
property standardize¶
- Standardize numeric columns to have zero mean and unit variance. - Type: - bool, defaults to- True.- Examples
 - >>> boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") >>> predictors = boston.columns[:-1] >>> response = "medv" >>> boston['chas'] = boston['chas'].asfactor() >>> train, valid = boston.split_frame(ratios=[.8]) >>> boston_glm = H2OGeneralizedLinearEstimator(standardize=True) >>> boston_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> boston_glm.mse() 
 - 
property startval¶
- double array to initialize fixed and random coefficients for HGLM, coefficients for GLM. If standardize is true, the standardized coefficients should be used. Otherwise, use the regular coefficients. - Type: - List[float].
 - 
property stopping_metric¶
- Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. - Type: - Literal["auto", "deviance", "logloss", "mse", "rmse", "mae", "rmsle", "auc", "aucpr", "lift_top_group", "misclassification", "mean_per_class_error", "custom", "custom_increasing"], defaults to- "auto".
 - 
property stopping_rounds¶
- Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) - Type: - int, defaults to- 0.
 - 
property stopping_tolerance¶
- Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much) - Type: - float, defaults to- 0.001.
 - 
property theta¶
- Theta - Type: - float, defaults to- 1e-10.- Examples
 - >>> h2o_df = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/glm_test/Motor_insurance_sweden.txt") >>> predictors = ["Payment", "Insured", "Kilometres", "Zone", "Bonus", "Make"] >>> response = "Claims" >>> negativebinomial_fit = H2OGeneralizedLinearEstimator(family="negativebinomial", ... link="identity", ... theta=0.5) >>> negativebinomial_fit.train(x=predictors, ... y=response, ... training_frame=h2o_df) >>> negativebinomial_fit.scoring_history() 
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], ... seed=1234) >>> cars_glm = H2OGeneralizedLinearEstimator(seed=1234, ... family='binomial') >>> cars_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_glm.auc(train=True) 
 - 
property tweedie_epsilon¶
- In estimating tweedie dispersion parameter using maximum likelihood, this is used to choose the lower and upper indices in the approximating of the infinite series summation. - Type: - float, defaults to- 8e-17.
 - 
property tweedie_link_power¶
- Tweedie link power. - Type: - float, defaults to- 1.0.- Examples
 - >>> auto = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/auto.csv") >>> predictors = auto.names >>> predictors.remove('y') >>> response = "y" >>> train, valid = auto.split_frame(ratios=[.8]) >>> auto_glm = H2OGeneralizedLinearEstimator(family='tweedie', ... tweedie_link_power=1) >>> auto_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(auto_glm.mse(valid=True)) 
 - 
property tweedie_variance_power¶
- Tweedie variance power - Type: - float, defaults to- 0.0.- Examples
 - >>> auto = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/auto.csv") >>> predictors = auto.names >>> predictors.remove('y') >>> response = "y" >>> train, valid = auto.split_frame(ratios=[.8]) >>> auto_glm = H2OGeneralizedLinearEstimator(family='tweedie', ... tweedie_variance_power=1) >>> auto_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(auto_glm.mse(valid=True)) 
 - 
property validation_frame¶
- Id of the validation data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_glm = H2OGeneralizedLinearEstimator(seed=1234, ... family='binomial') >>> cars_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_glm.auc(valid=True) 
 - 
property weights_column¶
- Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0. - Type: - str.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_glm = H2OGeneralizedLinearEstimator(seed=1234, ... family='binomial') >>> cars_glm.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid, ... weights_column="weight") >>> cars_glm.auc(valid=True) 
 
- 
property 
H2OInfogram¶
- 
class h2o.estimators.infogram.H2OInfogram(model_id=None, training_frame=None, validation_frame=None, seed=-1, keep_cross_validation_models=True, keep_cross_validation_predictions=False, keep_cross_validation_fold_assignment=False, nfolds=0, fold_assignment='auto', fold_column=None, response_column=None, ignored_columns=None, ignore_const_cols=True, score_each_iteration=False, offset_column=None, weights_column=None, standardize=False, distribution='auto', plug_values=None, max_iterations=0, stopping_rounds=0, stopping_metric='auto', stopping_tolerance=0.001, balance_classes=False, class_sampling_factors=None, max_after_balance_size=5.0, max_runtime_secs=0.0, custom_metric_func=None, auc_type='auto', algorithm='auto', algorithm_params=None, protected_columns=None, total_information_threshold=-1.0, net_information_threshold=-1.0, relevance_index_threshold=-1.0, safety_index_threshold=-1.0, data_fraction=1.0, top_n_features=50)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- Information Diagram - The infogram is a graphical information-theoretic interpretability tool which allows the user to quickly spot the core, decision-making variables that uniquely and safely drive the response, in supervised classification problems. The infogram can significantly cut down the number of predictors needed to build a model by identifying only the most valuable, admissible features. When protected variables such as race or gender are present in the data, the admissibility of a variable is determined by a safety and relevancy index, and thus serves as a diagnostic tool for fairness. The safety of each feature can be quantified and variables that are unsafe will be considered inadmissible. Models built using only admissible features will naturally be more interpretable, given the reduced feature set. Admissible models are also less susceptible to overfitting and train faster, while providing similar accuracy as models built using all available features. - 
property algorithm¶
- Type of machine learning algorithm used to build the infogram. Options include ‘AUTO’ (gbm), ‘deeplearning’ (Deep Learning with default parameters), ‘drf’ (Random Forest with default parameters), ‘gbm’ (GBM with default parameters), ‘glm’ (GLM with default parameters), or ‘xgboost’ (if available, XGBoost with default parameters). - Type: - Literal["auto", "deeplearning", "drf", "gbm", "glm", "xgboost"], defaults to- "auto".
 - 
property algorithm_params¶
- Customized parameters for the machine learning algorithm specified in the algorithm parameter. - Type: - dict.
 - 
property auc_type¶
- Set default multinomial AUC type. - Type: - Literal["auto", "none", "macro_ovr", "weighted_ovr", "macro_ovo", "weighted_ovo"], defaults to- "auto".
 - 
property balance_classes¶
- Balance training data class counts via over/under-sampling (for imbalanced data). - Type: - bool, defaults to- False.
 - 
property class_sampling_factors¶
- Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes. - Type: - List[float].
 - 
property custom_metric_func¶
- Reference to custom evaluation function, format: language:keyName=funcName - Type: - str.
 - 
property data_fraction¶
- The fraction of training frame to use to build the infogram model. Defaults to 1.0, and any value greater than 0 and less than or equal to 1.0 is acceptable. - Type: - float, defaults to- 1.0.
 - 
property distribution¶
- Distribution function - Type: - Literal["auto", "bernoulli", "multinomial", "gaussian", "poisson", "gamma", "tweedie", "laplace", "quantile", "huber"], defaults to- "auto".
 - 
property fold_assignment¶
- Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems. - Type: - Literal["auto", "random", "modulo", "stratified"], defaults to- "auto".
 - 
property fold_column¶
- Column with cross-validation fold index assignment per observation. - Type: - str.
 - 
get_admissible_relevance()[source]¶
- Returns
- a list of relevance (variable importance) for admissible attributes 
 
 - 
get_admissible_score_frame(valid=False, xval=False)[source]¶
- Retreive admissible score frame which includes relevance and CMI information in an H2OFrame for training dataset by default :param valid: return infogram info on validation dataset if True :param xval: return infogram info on cross-validation hold outs if True :return: H2OFrame 
 - 
get_all_predictor_cmi()[source]¶
- Get normalized CMI of all predictors. :return: two tuples, first one is predictor names and second one is cmi 
 - 
get_all_predictor_cmi_raw()[source]¶
- Get raw CMI of all predictors. :return: two tuples, first one is predictor names and second one is cmi 
 - 
get_all_predictor_relevance()[source]¶
- Get relevance of all predictors :return: two tuples, first one is predictor names and second one is relevance 
 - 
property ignore_const_cols¶
- Ignore constant columns. - Type: - bool, defaults to- True.
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property keep_cross_validation_fold_assignment¶
- Whether to keep the cross-validation fold assignment. - Type: - bool, defaults to- False.
 - 
property keep_cross_validation_models¶
- Whether to keep the cross-validation models. - Type: - bool, defaults to- True.
 - 
property keep_cross_validation_predictions¶
- Whether to keep the predictions of the cross-validation models. - Type: - bool, defaults to- False.
 - 
property max_after_balance_size¶
- Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes. - Type: - float, defaults to- 5.0.
 - 
property max_iterations¶
- Maximum number of iterations. - Type: - int, defaults to- 0.
 - 
property max_runtime_secs¶
- Maximum allowed runtime in seconds for model training. Use 0 to disable. - Type: - float, defaults to- 0.0.
 - 
property net_information_threshold¶
- A number between 0 and 1 representing a threshold for net information, defaulting to 0.1. For a specific feature, if the net information is higher than this threshold, and the corresponding total information is also higher than the total_information_threshold, that feature will be considered admissible. The net information is the y-axis of the Core Infogram. Default is -1 which gets set to 0.1. - Type: - float, defaults to- -1.0.
 - 
property nfolds¶
- Number of folds for K-fold cross-validation (0 to disable or >= 2). - Type: - int, defaults to- 0.
 - 
property offset_column¶
- Offset column. This will be added to the combination of columns before applying the link function. - Type: - str.
 - 
plot(train=True, valid=False, xval=False, figsize=(10, 10), title='Infogram', legend_on=False, server=False)[source]¶
- Plot the infogram. By default, it will plot the infogram calculated from training dataset. Note that the frame rel_cmi_frame contains the following columns: - 0: predictor names - 1: admissible - 2: admissible index - 3: relevance-index or total information - 4: safety-index or net information, normalized from 0 to 1 - 5: safety-index or net information not normalized - Parameters
- train – True if infogram is generated from training dataset 
- valid – True if infogram is generated from validation dataset 
- xval – True if infogram is generated from cross-validation holdout dataset 
- figsize – size of infogram plot 
- title – string to denote title of the plot 
- legend_on – legend text is included if True 
- server – True will not generate plot, False will produce plot 
 
- Returns
- infogram plot if server=True or None if server=False 
 
 - 
property plug_values¶
- Plug Values (a single row frame containing values that will be used to impute missing values of the training/validation frame, use with conjunction missing_values_handling = PlugValues). - Type: - Union[None, str, H2OFrame].
 - 
property protected_columns¶
- Columns that contain features that are sensitive and need to be protected (legally, or otherwise), if applicable. These features (e.g. race, gender, etc) should not drive the prediction of the response. - Type: - List[str].
 - 
property relevance_index_threshold¶
- A number between 0 and 1 representing a threshold for the relevance index, defaulting to 0.1. This is only used when - protected_columnsis set by the user. For a specific feature, if the relevance index value is higher than this threshold, and the corresponding safety index is also higher than the safety_index_threshold``, that feature will be considered admissible. The relevance index is the x-axis of the Fair Infogram. Default is -1 which gets set to 0.1.- Type: - float, defaults to- -1.0.
 - 
property response_column¶
- Response variable column. - Type: - str.
 - 
property safety_index_threshold¶
- A number between 0 and 1 representing a threshold for the safety index, defaulting to 0.1. This is only used when protected_columns is set by the user. For a specific feature, if the safety index value is higher than this threshold, and the corresponding relevance index is also higher than the relevance_index_threshold, that feature will be considered admissible. The safety index is the y-axis of the Fair Infogram. Default is -1 which gets set to 0.1. - Type: - float, defaults to- -1.0.
 - 
property score_each_iteration¶
- Whether to score during each iteration of model training. - Type: - bool, defaults to- False.
 - 
property seed¶
- Seed for pseudo random number generator (if applicable). - Type: - int, defaults to- -1.
 - 
property standardize¶
- Standardize numeric columns to have zero mean and unit variance. - Type: - bool, defaults to- False.
 - 
property stopping_metric¶
- Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. - Type: - Literal["auto", "deviance", "logloss", "mse", "rmse", "mae", "rmsle", "auc", "aucpr", "lift_top_group", "misclassification", "mean_per_class_error", "custom", "custom_increasing"], defaults to- "auto".
 - 
property stopping_rounds¶
- Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) - Type: - int, defaults to- 0.
 - 
property stopping_tolerance¶
- Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much) - Type: - float, defaults to- 0.001.
 - 
property top_n_features¶
- An integer specifying the number of columns to evaluate in the infogram. The columns are ranked by variable importance, and the top N are evaluated. Defaults to 50. - Type: - int, defaults to- 50.
 - 
property total_information_threshold¶
- A number between 0 and 1 representing a threshold for total information, defaulting to 0.1. For a specific feature, if the total information is higher than this threshold, and the corresponding net information is also higher than the threshold - net_information_threshold, that feature will be considered admissible. The total information is the x-axis of the Core Infogram. Default is -1 which gets set to 0.1.- Type: - float, defaults to- -1.0.
 - 
train(x=None, y=None, training_frame=None, verbose=False, **kwargs)[source]¶
- Train the H2O model. - Parameters
- x – A list of column names or indices indicating the predictor columns. 
- y – An index or a column name indicating the response column. 
- training_frame (H2OFrame) – The H2OFrame having the columns indicated by x and y (as well as any additional columns specified by fold, offset, and weights). 
- offset_column – The name or index of the column in training_frame that holds the offsets. 
- fold_column – The name or index of the column in training_frame that holds the per-row fold assignments. 
- weights_column – The name or index of the column in training_frame that holds the per-row weights. 
- validation_frame – H2OFrame with validation data to be scored on while training. 
- max_runtime_secs (float) – Maximum allowed runtime in seconds for model training. Use 0 to disable. 
- verbose (bool) – Print scoring history to stdout. Defaults to False. 
 
 
 - 
train_subset_models(model_class, y, training_frame, test_frame, protected_columns=None, reference=None, favorable_class=None, feature_selection_metrics=None, metric='euclidean', **kwargs)[source]¶
- Train models using different feature subsets selected by infogram. - Parameters
- model_class – H2O Estimator class, H2OAutoML, or H2OGridSearch 
- y – response column 
- training_frame – training frame 
- test_frame – test frame 
- protected_columns – List of categorical columns that contain sensitive information such as race, gender, age etc. 
- reference – List of values corresponding to a reference for each protected columns. If set to - None, it will use the biggest group as the reference.
- favorable_class – Positive/favorable outcome class of the response. 
- feature_selection_metrics – column names from infogram’s admissible score frame that are used for the feature subset selection. Defaults to - safety_indexfor fair infogram and- admissible_indexfor the core infogram.
- metric – metric to combine information from the columns specified in feature_selection_metrics. Can be one of “euclidean”, “manhattan”, “maximum”, or a function with that takes the admissible score frame and feature_selection_metrics and produces a single column. 
- kwargs – Arguments passed to the constructor of the model_class 
 
- Returns
- H2OFrame 
- Examples
 - >>> from h2o.estimators import H2OGradientBoostingEstimator, H2OInfogram >>> data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/admissibleml_test/taiwan_credit_card_uci.csv") >>> x = ['LIMIT_BAL', 'AGE', 'PAY_0', 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6', 'BILL_AMT1', 'BILL_AMT2', 'BILL_AMT3', >>> 'BILL_AMT4', 'BILL_AMT5', 'BILL_AMT6', 'PAY_AMT1', 'PAY_AMT2', 'PAY_AMT3', 'PAY_AMT4', 'PAY_AMT5', 'PAY_AMT6'] >>> y = "default payment next month" >>> protected_columns = ['SEX', 'EDUCATION'] >>> >>> for c in [y] + protected_columns: >>> data[c] = data[c].asfactor() >>> >>> train, test = data.split_frame([0.8]) >>> >>> reference = ["1", "2"] # university educated single man >>> favorable_class = "0" # no default next month >>> >>> ig = H2OInfogram(protected_columns=protected_columns) >>> ig.train(x, y, training_frame=train) >>> >>> ig.train_subset_models(H2OGradientBoostingEstimator, y, train, test, protected_columns, reference, favorable_class) 
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].
 - 
property validation_frame¶
- Id of the validation data frame. - Type: - Union[None, str, H2OFrame].
 - 
property weights_column¶
- Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0. - Type: - str.
 
- 
property 
H2OIsotonicRegressionEstimator¶
- 
class h2o.estimators.isotonicregression.H2OIsotonicRegressionEstimator(model_id=None, training_frame=None, validation_frame=None, response_column=None, ignored_columns=None, weights_column=None, out_of_bounds='na', custom_metric_func=None, nfolds=0, keep_cross_validation_models=True, keep_cross_validation_predictions=False, keep_cross_validation_fold_assignment=False, fold_assignment='auto', fold_column=None)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- Isotonic Regression - 
property custom_metric_func¶
- Reference to custom evaluation function, format: language:keyName=funcName - Type: - str.
 - 
property fold_assignment¶
- Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems. - Type: - Literal["auto", "random", "modulo", "stratified"], defaults to- "auto".
 - 
property fold_column¶
- Column with cross-validation fold index assignment per observation. - Type: - str.
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property keep_cross_validation_fold_assignment¶
- Whether to keep the cross-validation fold assignment. - Type: - bool, defaults to- False.
 - 
property keep_cross_validation_models¶
- Whether to keep the cross-validation models. - Type: - bool, defaults to- True.
 - 
property keep_cross_validation_predictions¶
- Whether to keep the predictions of the cross-validation models. - Type: - bool, defaults to- False.
 - 
property nfolds¶
- Number of folds for K-fold cross-validation (0 to disable or >= 2). - Type: - int, defaults to- 0.
 - 
property out_of_bounds¶
- Method of handling values of X predictor that are outside of the bounds seen in training. - Type: - Literal["na", "clip"], defaults to- "na".- Examples
 - >>> import h2o >>> from h2o import H2OFrame >>> from h2o.estimators.isotonicregression import H2OIsotonicRegressionEstimator >>> from sklearn.datasets import make_regression >>> import numpy as np >>> h2o.init() >>> X, y = make_regression(n_samples=10000, n_features=1, random_state=41, noise=0.8) >>> X = X.reshape(-1) >>> train = H2OFrame(np.column_stack((y, X)), column_names=["y", "X"]) >>> w_values = np.random.rand(train.shape[0]) >>> w_frame = H2OFrame(w_values.reshape(-1, 1), column_names=["w"]) >>> train = train.cbind(w_frame) >>> h2o_iso_reg = H2OIsotonicRegressionEstimator(out_of_bounds="clip") >>> h2o_iso_reg.train(training_frame=train, x="X", y="y") >>> h2o_iso_reg.predict(train) 
 - 
property response_column¶
- Response variable column. - Type: - str.
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].
 - 
property validation_frame¶
- Id of the validation data frame. - Type: - Union[None, str, H2OFrame].
 - 
property weights_column¶
- Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0. - Type: - str.
 
- 
property 
H2OModelSelectionEstimator¶
- 
class h2o.estimators.model_selection.H2OModelSelectionEstimator(model_id=None, training_frame=None, validation_frame=None, nfolds=0, seed=-1, fold_assignment='auto', fold_column=None, response_column=None, ignored_columns=None, ignore_const_cols=True, score_each_iteration=False, score_iteration_interval=0, offset_column=None, weights_column=None, family='auto', link='family_default', tweedie_variance_power=0.0, tweedie_link_power=0.0, theta=0.0, solver='irlsm', alpha=None, lambda_=[0.0], lambda_search=False, early_stopping=False, nlambdas=0, standardize=True, missing_values_handling='mean_imputation', plug_values=None, compute_p_values=False, remove_collinear_columns=False, intercept=True, non_negative=False, max_iterations=0, objective_epsilon=-1.0, beta_epsilon=0.0001, gradient_epsilon=-1.0, startval=None, prior=0.0, cold_start=False, lambda_min_ratio=0.0, beta_constraints=None, max_active_predictors=-1, obj_reg=-1.0, stopping_rounds=0, stopping_metric='auto', stopping_tolerance=0.001, balance_classes=False, class_sampling_factors=None, max_after_balance_size=5.0, max_confusion_matrix_size=20, max_runtime_secs=0.0, nparallelism=0, max_predictor_number=1, min_predictor_number=1, mode='maxr', build_glm_model=False, p_values_threshold=0.0, influence=None, multinode_mode=False)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- Model Selection - H2O ModelSelection is used to build the best model with one predictor, two predictors, … up to max_predictor_number specified in the algorithm parameters when mode=allsubsets. The best model is the one with the highest R2 value. When mode=maxr, the model returned is no longer guaranteed to have the best R2 value. - 
property alpha¶
- Distribution of regularization between the L1 (Lasso) and L2 (Ridge) penalties. A value of 1 for alpha represents Lasso regression, a value of 0 produces Ridge regression, and anything in between specifies the amount of mixing between the two. Default value of alpha is 0 when SOLVER = ‘L-BFGS’; 0.5 otherwise. - Type: - List[float].
 - 
property balance_classes¶
- Balance training data class counts via over/under-sampling (for imbalanced data). - Type: - bool, defaults to- False.
 - 
property beta_constraints¶
- Beta constraints - Type: - Union[None, str, H2OFrame].
 - 
property beta_epsilon¶
- Converge if beta changes less (using L-infinity norm) than beta esilon, ONLY applies to IRLSM solver - Type: - float, defaults to- 0.0001.
 - 
property build_glm_model¶
- For maxrsweep mode only. If true, will return full blown GLM models with the desired predictorsubsets. If false, only the predictor subsets, predictor coefficients are returned. This is forspeeding up the model selection process. The users can choose to build the GLM models themselvesby using the predictor subsets themselves. Defaults to false. - Type: - bool, defaults to- False.- Examples
 - >>> import h2o >>> from h2o.estimators import H2OModelSelectionEstimator >>> h2o.init() >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/prostate.csv") >>> predictors = ["AGE", "RACE", "CAPSULE", "DCAPS", "PSA", "VOL", "DPROS"] >>> response = "GLEASON" >>> maxrModel = H2OModelSelectionEstimator(max_predictor_number=5, ... seed=12345, ... mode="maxrsweep", ... build_glm_model=True) >>> maxrModel.train(x=predictors, y=response, training_frame=prostate) >>> result = maxrModel.result() >>> # get the GLM model with the best performance for a fixed predictor size: >>> one_model = h2o.get_model(result["model_id"][1, 0]) >>> predict = one_model.predict(prostate) >>> # print a version of the predict frame: >>> print(predict) 
 - 
property class_sampling_factors¶
- Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes. - Type: - List[float].
 - 
coef(predictor_size=None)[source]¶
- Get the coefficients for all models built with different number of predictors. - Parameters
- predictor_size – predictor subset size, will only return model coefficients of that subset size. 
- Returns
- list of Python Dicts of coefficients for all models built with different predictor numbers 
- Examples
 - >>> import h2o >>> from h2o.estimators import H2OModelSelectionEstimator >>> h2o.init() >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/prostate.csv") >>> predictors = ["AGE", "RACE", "CAPSULE", "DCAPS", "PSA", "VOL", "DPROS"] >>> response = "GLEASON" >>> maxrModel = H2OModelSelectionEstimator(max_predictor_number=5, ... seed=12345, ... mode="maxr") >>> maxrModel.train(x=predictors, y=response, training_frame=prostate) >>> coeff = maxrModel.coef() >>> print(coeff) >>> coeff_3 = maxrModel.coef(predictor_size=3) >>> print(coeff_3) 
 - 
coef_norm(predictor_size=None)[source]¶
- Get the normalized coefficients for all models built with different number of predictors. - Parameters
- predictor_size – predictor subset size, will only return model coefficients of that subset size. 
- Returns
- list of Python Dicts of coefficients for all models built with different predictor numbers 
- Examples
 - >>> import h2o >>> from h2o.estimators import H2OModelSelectionEstimator >>> h2o.init() >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/prostate.csv") >>> predictors = ["AGE", "RACE", "CAPSULE", "DCAPS", "PSA", "VOL", "DPROS"] >>> response = "GLEASON" >>> maxrModel = H2OModelSelectionEstimator(max_predictor_number=5, ... seed=12345, ... mode="maxr") >>> maxrModel.train(x=predictors, y=response, training_frame=prostate) >>> coeff_norm = maxrModel.coef_norm() >>> print(coeff_norm) >>> coeff_norm_3 = maxrModel.coef_norm(predictor_size=3) # print coefficient norm with 3 predictors >>> print(coeff_norm_3) 
 - 
property cold_start¶
- Only applicable to multiple alpha/lambda values. If false, build the next model for next set of alpha/lambda values starting from the values provided by current model. If true will start GLM model from scratch. - Type: - bool, defaults to- False.
 - 
property compute_p_values¶
- Request p-values computation, p-values work only with IRLSM solver and no regularization - Type: - bool, defaults to- False.
 - 
property early_stopping¶
- Stop early when there is no more relative improvement on train or validation (if provided) - Type: - bool, defaults to- False.
 - 
property family¶
- Family. For maxr/maxrsweep, only gaussian. For backward, ordinal and multinomial families are not supported - Type: - Literal["auto", "gaussian", "binomial", "fractionalbinomial", "quasibinomial", "poisson", "gamma", "tweedie", "negativebinomial"], defaults to- "auto".
 - 
property fold_assignment¶
- Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems. - Type: - Literal["auto", "random", "modulo", "stratified"], defaults to- "auto".
 - 
property fold_column¶
- Column with cross-validation fold index assignment per observation. - Type: - str.
 - 
get_best_R2_values()[source]¶
- Get list of best R2 values of models with 1 predictor, 2 predictors, …, max_predictor_number of predictors - Returns
- a list of best r2 values 
 
 - 
get_best_model_predictors()[source]¶
- Get list of best models with 1 predictor, 2 predictors, …, max_predictor_number of predictors that have the highest r2 values - Returns
- a list of best predictors subset 
 
 - 
get_predictors_added_per_step()[source]¶
- Get list of predictors added at each step of the model building process - Returns
- a list of predictors added at each step 
 
 - 
get_predictors_removed_per_step()[source]¶
- Get list of predictors removed at each step of the model building process - Returns
- a list of predictors removed at each step 
 
 - 
get_regression_influence_diagnostics(predictor_size=None)[source]¶
- Get the regression influence diagnostics frames for all models with different number of predictors. If a predictor size is specified, only one frame is returned for that predictor size. - Parameters
- predictor_size – predictor subset size, will return regression influence diagnostics frame of that size 
- Returns
- list of H2OFrames or just one frame that contains predictors, response and DFBETA_ predictors 
 
 - 
property gradient_epsilon¶
- Converge if objective changes less (using L-infinity norm) than this, ONLY applies to L-BFGS solver. Default (of -1.0) indicates: If lambda_search is set to False and lambda is equal to zero, the default value of gradient_epsilon is equal to .000001, otherwise the default value is .0001. If lambda_search is set to True, the conditional values above are 1E-8 and 1E-6 respectively. - Type: - float, defaults to- -1.0.
 - 
property ignore_const_cols¶
- Ignore constant columns. - Type: - bool, defaults to- True.
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property influence¶
- If set to dfbetas will calculate the difference in beta when a datarow is included and excluded in the dataset. - Type: - Literal["dfbetas"].- Examples
 - >>> import h2o >>> from h2o.estimators import H2OModelSelectionEstimator >>> h2o.init() >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/prostate.csv") >>> predictors = ["AGE", "RACE", "CAPSULE", "DCAPS", "PSA", "VOL", "DPROS"] >>> response = "GLEASON" >>> maxrModel = H2OModelSelectionEstimator(max_predictor_number=5, ... seed=12345, ... mode="maxr", ... influence="dfbetas") >>> maxrModel.train(x=predictors, y=response, training_frame=prostate) >>> glm_rid = maxrModel.get_regression_influence_diagnostics() >>> print(glm_rid) 
 - 
property intercept¶
- Include constant term in the model - Type: - bool, defaults to- True.
 - 
property lambda_¶
- Regularization strength - Type: - List[float], defaults to- [0.0].
 - 
property lambda_min_ratio¶
- Minimum lambda used in lambda search, specified as a ratio of lambda_max (the smallest lambda that drives all coefficients to zero). Default indicates: if the number of observations is greater than the number of variables, then lambda_min_ratio is set to 0.0001; if the number of observations is less than the number of variables, then lambda_min_ratio is set to 0.01. - Type: - float, defaults to- 0.0.
 - 
property lambda_search¶
- Use lambda search starting at lambda max, given lambda is then interpreted as lambda min - Type: - bool, defaults to- False.
 - 
property link¶
- Link function. - Type: - Literal["family_default", "identity", "logit", "log", "inverse", "tweedie", "ologit"], defaults to- "family_default".
 - 
property max_active_predictors¶
- Maximum number of active predictors during computation. Use as a stopping criterion to prevent expensive model building with many predictors. Default indicates: If the IRLSM solver is used, the value of max_active_predictors is set to 5000 otherwise it is set to 100000000. - Type: - int, defaults to- -1.
 - 
property max_after_balance_size¶
- Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes. - Type: - float, defaults to- 5.0.
 - 
property max_confusion_matrix_size¶
- [Deprecated] Maximum size (# classes) for confusion matrices to be printed in the Logs - Type: - int, defaults to- 20.
 - 
property max_iterations¶
- Maximum number of iterations - Type: - int, defaults to- 0.
 - 
property max_predictor_number¶
- Maximum number of predictors to be considered when building GLM models. Defaults to 1. - Type: - int, defaults to- 1.
 - 
property max_runtime_secs¶
- Maximum allowed runtime in seconds for model training. Use 0 to disable. - Type: - float, defaults to- 0.0.
 - 
property min_predictor_number¶
- For mode = ‘backward’ only. Minimum number of predictors to be considered when building GLM models starting with all predictors to be included. Defaults to 1. - Type: - int, defaults to- 1.
 - 
property missing_values_handling¶
- Handling of missing values. Either MeanImputation, Skip or PlugValues. - Type: - Literal["mean_imputation", "skip", "plug_values"], defaults to- "mean_imputation".
 - 
property mode¶
- Mode: Used to choose model selection algorithms to use. Options include ‘allsubsets’ for all subsets, ‘maxr’ that uses sequential replacement and GLM to build all models, slow but works with cross-validation, validation frames for more robust results, ‘maxrsweep’ that uses sequential replacement and sweeping action, much faster than ‘maxr’, ‘backward’ for backward selection. - Type: - Literal["allsubsets", "maxr", "maxrsweep", "backward"], defaults to- "maxr".- Examples
 - >>> import h2o >>> from h2o.estimators import H2OModelSelectionEstimator >>> h2o.init() >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/prostate.csv") >>> predictors = ["AGE", "RACE", "CAPSULE", "DCAPS", "PSA", "VOL", "DPROS"] >>> response = "GLEASON" >>> maxrModel = H2OModelSelectionEstimator(max_predictor_number=5, ... seed=12345, ... mode="maxr") >>> maxrModel.train(x=predictors, y=response, training_frame=prostate) >>> results = maxrModel.result() >>> print(results) 
 - 
property multinode_mode¶
- For maxrsweep only. If enabled, will attempt to perform sweeping action using multiple nodes in the cluster. Defaults to false. - Type: - bool, defaults to- False.
 - 
property nfolds¶
- Number of folds for K-fold cross-validation (0 to disable or >= 2). - Type: - int, defaults to- 0.
 - 
property nlambdas¶
- Number of lambdas to be used in a search. Default indicates: If alpha is zero, with lambda search set to True, the value of nlamdas is set to 30 (fewer lambdas are needed for ridge regression) otherwise it is set to 100. - Type: - int, defaults to- 0.
 - 
property non_negative¶
- Restrict coefficients (not intercept) to be non-negative - Type: - bool, defaults to- False.
 - 
property nparallelism¶
- number of models to build in parallel. Defaults to 0.0 which is adaptive to the system capability - Type: - int, defaults to- 0.
 - 
property obj_reg¶
- Likelihood divider in objective value computation, default (of -1.0) will set it to 1/nobs - Type: - float, defaults to- -1.0.
 - 
property objective_epsilon¶
- Converge if objective value changes less than this. Default (of -1.0) indicates: If lambda_search is set to True the value of objective_epsilon is set to .0001. If the lambda_search is set to False and lambda is equal to zero, the value of objective_epsilon is set to .000001, for any other value of lambda the default value of objective_epsilon is set to .0001. - Type: - float, defaults to- -1.0.
 - 
property offset_column¶
- Offset column. This will be added to the combination of columns before applying the link function. - Type: - str.
 - 
property p_values_threshold¶
- For mode=’backward’ only. If specified, will stop the model building process when all coefficientsp-values drop below this threshold - Type: - float, defaults to- 0.0.- Examples
 - >>> import h2o >>> from h2o.estimators import H2OModelSelectionEstimator >>> h2o.init() >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/prostate.csv") >>> predictors = ["AGE", "RACE", "CAPSULE", DCAPS", "PSA", "VOL", "DPROS"] >>> response = "GLEASON" >>> backwardModel = H2OModelSelectionEstimator(min_predictor_number=2, ... seed=12345, ... mode="backward", ... p_values_threshold=0.001) >>> backwardModel.train(x=predictors, y=response, training_frame=prostate) >>> result = backwardModel.result() >>> print(result) 
 - 
property plug_values¶
- Plug Values (a single row frame containing values that will be used to impute missing values of the training/validation frame, use with conjunction missing_values_handling = PlugValues) - Type: - Union[None, str, H2OFrame].
 - 
property prior¶
- Prior probability for y==1. To be used only for logistic regression iff the data has been sampled and the mean of response does not reflect reality. - Type: - float, defaults to- 0.0.
 - 
property remove_collinear_columns¶
- In case of linearly dependent columns, remove some of the dependent columns - Type: - bool, defaults to- False.
 - 
property response_column¶
- Response variable column. - Type: - str.
 - 
result()[source]¶
- Get result frame that contains information about the model building process like for modelselection and anovaglm. - Returns
- the H2OFrame that contains information about the model building process like for modelselection and anovaglm. 
 
 - 
property score_each_iteration¶
- Whether to score during each iteration of model training. - Type: - bool, defaults to- False.
 - 
property score_iteration_interval¶
- Perform scoring for every score_iteration_interval iterations - Type: - int, defaults to- 0.
 - 
property seed¶
- Seed for pseudo random number generator (if applicable) - Type: - int, defaults to- -1.
 - 
property solver¶
- AUTO will set the solver based on given data and the other parameters. IRLSM is fast on on problems with small number of predictors and for lambda-search with L1 penalty, L_BFGS scales better for datasets with many columns. - Type: - Literal["auto", "irlsm", "l_bfgs", "coordinate_descent_naive", "coordinate_descent", "gradient_descent_lh", "gradient_descent_sqerr"], defaults to- "irlsm".
 - 
property standardize¶
- Standardize numeric columns to have zero mean and unit variance - Type: - bool, defaults to- True.
 - 
property startval¶
- double array to initialize fixed and random coefficients for HGLM, coefficients for GLM. - Type: - List[float].
 - 
property stopping_metric¶
- Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. - Type: - Literal["auto", "deviance", "logloss", "mse", "rmse", "mae", "rmsle", "auc", "aucpr", "lift_top_group", "misclassification", "mean_per_class_error", "custom", "custom_increasing"], defaults to- "auto".
 - 
property stopping_rounds¶
- Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) - Type: - int, defaults to- 0.
 - 
property stopping_tolerance¶
- Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much) - Type: - float, defaults to- 0.001.
 - 
property theta¶
- Theta - Type: - float, defaults to- 0.0.
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].
 - 
property tweedie_link_power¶
- Tweedie link power - Type: - float, defaults to- 0.0.
 - 
property tweedie_variance_power¶
- Tweedie variance power - Type: - float, defaults to- 0.0.
 - 
property validation_frame¶
- Id of the validation data frame. - Type: - Union[None, str, H2OFrame].
 - 
property weights_column¶
- Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0. - Type: - str.
 
- 
property 
H2ONaiveBayesEstimator¶
- 
class h2o.estimators.naive_bayes.H2ONaiveBayesEstimator(model_id=None, nfolds=0, seed=-1, fold_assignment='auto', fold_column=None, keep_cross_validation_models=True, keep_cross_validation_predictions=False, keep_cross_validation_fold_assignment=False, training_frame=None, validation_frame=None, response_column=None, ignored_columns=None, ignore_const_cols=True, score_each_iteration=False, balance_classes=False, class_sampling_factors=None, max_after_balance_size=5.0, max_confusion_matrix_size=20, laplace=0.0, min_sdev=0.001, eps_sdev=0.0, min_prob=0.001, eps_prob=0.0, compute_metrics=True, max_runtime_secs=0.0, export_checkpoints_dir=None, gainslift_bins=-1, auc_type='auto')[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- Naive Bayes - The naive Bayes classifier assumes independence between predictor variables conditional on the response, and a Gaussian distribution of numeric predictors with mean and standard deviation computed from the training dataset. When building a naive Bayes classifier, every row in the training dataset that contains at least one NA will be skipped completely. If the test dataset has missing values, then those predictors are omitted in the probability calculation during prediction. - 
property auc_type¶
- Set default multinomial AUC type. - Type: - Literal["auto", "none", "macro_ovr", "weighted_ovr", "macro_ovo", "weighted_ovo"], defaults to- "auto".
 - 
property balance_classes¶
- Balance training data class counts via over/under-sampling (for imbalanced data). - Type: - bool, defaults to- False.- Examples
 - >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") >>> iris_nb = H2ONaiveBayesEstimator(balance_classes=False, ... nfolds=3, ... seed=1234) >>> iris_nb.train(x=list(range(4)), ... y=4, ... training_frame=iris) >>> iris_nb.mse() 
 - 
property class_sampling_factors¶
- Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes. - Type: - List[float].- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> sample_factors = [1., 0.5, 1., 1., 1., 1., 1.] >>> cov_nb = H2ONaiveBayesEstimator(class_sampling_factors=sample_factors, ... seed=1234) >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> cov_nb.train(x=predictors, y=response, training_frame=covtype) >>> cov_nb.logloss() 
 - 
property compute_metrics¶
- Compute metrics on training data - Type: - bool, defaults to- True.- Examples
 - >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip") >>> prostate['CAPSULE'] = prostate['CAPSULE'].asfactor() >>> prostate['RACE'] = prostate['RACE'].asfactor() >>> prostate['DCAPS'] = prostate['DCAPS'].asfactor() >>> prostate['DPROS'] = prostate['DPROS'].asfactor() >>> response_col = 'CAPSULE' >>> prostate_nb = H2ONaiveBayesEstimator(laplace=0, ... compute_metrics=False) >>> prostate_nb.train(x=list(range(3,9)), ... y=response_col, ... training_frame=prostate) >>> prostate_nb.show() 
 - 
property eps_prob¶
- Cutoff below which probability is replaced with min_prob - Type: - float, defaults to- 0.0.- Examples
 - >>> import random >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> problem = random.sample(["binomial","multinomial"],1) >>> predictors = ["displacement","power","weight","acceleration","year"] >>> if problem == "binomial": ... response_col = "economy_20mpg" ... else: ... response_col = "cylinders" >>> cars[response_col] = cars[response_col].asfactor() >>> cars_nb = H2ONaiveBayesEstimator(min_prob=0.1, ... eps_prob=0.5, ... seed=1234) >>> cars_nb.train(x=predictors, y=response_col, training_frame=cars) >>> cars_nb.mse() 
 - 
property eps_sdev¶
- Cutoff below which standard deviation is replaced with min_sdev - Type: - float, defaults to- 0.0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> problem = random.sample(["binomial","multinomial"],1) >>> predictors = ["displacement","power","weight","acceleration","year"] >>> if problem == "binomial": ... response_col = "economy_20mpg" ... else: ... response_col = "cylinders" >>> cars[response_col] = cars[response_col].asfactor() >>> cars_nb = H2ONaiveBayesEstimator(min_sdev=0.1, ... eps_sdev=0.5, ... seed=1234) >>> cars_nb.train(x=predictors, y=response_col, training_frame=cars) >>> cars_nb.mse() 
 - 
property export_checkpoints_dir¶
- Automatically export generated models to this directory. - Type: - str.- Examples
 - >>> import tempfile >>> from os import listdir >>> airlines = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip", destination_frame="air.hex") >>> predictors = ["DayofMonth", "DayOfWeek"] >>> response = "IsDepDelayed" >>> checkpoints_dir = tempfile.mkdtemp() >>> air_nb = H2ONaiveBayesEstimator(export_checkpoints_dir=checkpoints_dir) >>> air_nb.train(x=predictors, y=response, training_frame=airlines) >>> len(listdir(checkpoints_dir)) 
 - 
property fold_assignment¶
- Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems. - Type: - Literal["auto", "random", "modulo", "stratified"], defaults to- "auto".- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> cars_nb = H2ONaiveBayesEstimator(fold_assignment="Random", ... nfolds=5, ... seed=1234) >>> response = "economy_20mpg" >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> cars_nb.train(x=predictors, y=response, training_frame=cars) >>> cars_nb.auc() 
 - 
property fold_column¶
- Column with cross-validation fold index assignment per observation. - Type: - str.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> fold_numbers = cars.kfold_column(n_folds=5, seed=1234) >>> fold_numbers.set_names(["fold_numbers"]) >>> cars = cars.cbind(fold_numbers) >>> cars_nb = H2ONaiveBayesEstimator(seed=1234) >>> cars_nb.train(x=predictors, ... y=response, ... training_frame=cars, ... fold_column="fold_numbers") >>> cars_nb.auc() 
 - 
property gainslift_bins¶
- Gains/Lift table number of bins. 0 means disabled.. Default value -1 means automatic binning. - Type: - int, defaults to- -1.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/airlines_train.csv") >>> model = H2ONaiveBayesEstimator(gainslift_bins=20) >>> model.train(x=["Origin", "Distance"], ... y="IsDepDelayed", ... training_frame=airlines) >>> model.gains_lift() 
 - 
property ignore_const_cols¶
- Ignore constant columns. - Type: - bool, defaults to- True.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> cars["const_1"] = 6 >>> cars["const_2"] = 7 >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_nb = H2ONaiveBayesEstimator(seed=1234, ... ignore_const_cols=True) >>> cars_nb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_nb.auc() 
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property keep_cross_validation_fold_assignment¶
- Whether to keep the cross-validation fold assignment. - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_nb = H2ONaiveBayesEstimator(keep_cross_validation_fold_assignment=True, ... nfolds=5, ... seed=1234) >>> cars_nb.train(x=predictors, ... y=response, ... training_frame=train) >>> cars_nb.cross_validation_fold_assignment() 
 - 
property keep_cross_validation_models¶
- Whether to keep the cross-validation models. - Type: - bool, defaults to- True.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_nb = H2ONaiveBayesEstimator(keep_cross_validation_models=True, ... nfolds=5, ... seed=1234) >>> cars_nb.train(x=predictors, ... y=response, ... training_frame=train) >>> cars_nb.cross_validation_models() 
 - 
property keep_cross_validation_predictions¶
- Whether to keep the predictions of the cross-validation models. - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_nb = H2ONaiveBayesEstimator(keep_cross_validation_predictions=True, ... nfolds=5, ... seed=1234) >>> cars_nb.train(x=predictors, ... y=response, ... training_frame=train) >>> cars_nb.cross_validation_predictions() 
 - 
property laplace¶
- Laplace smoothing parameter - Type: - float, defaults to- 0.0.- Examples
 - >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip") >>> prostate['CAPSULE'] = prostate['CAPSULE'].asfactor() >>> prostate['RACE'] = prostate['RACE'].asfactor() >>> prostate['DCAPS'] = prostate['DCAPS'].asfactor() >>> prostate['DPROS'] = prostate['DPROS'].asfactor() >>> prostate_nb = H2ONaiveBayesEstimator(laplace=1) >>> prostate_nb.train(x=list(range(3,9)), ... y=response_col, ... training_frame=prostate) >>> prostate_nb.mse() 
 - 
property max_after_balance_size¶
- Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes. - Type: - float, defaults to- 5.0.- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> train, valid = covtype.split_frame(ratios=[.8], seed=1234) >>> max = .85 >>> cov_nb = H2ONaiveBayesEstimator(max_after_balance_size=max, ... seed=1234) >>> cov_nb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_nb.logloss() 
 - 
property max_confusion_matrix_size¶
- [Deprecated] Maximum size (# classes) for confusion matrices to be printed in the Logs - Type: - int, defaults to- 20.
 - 
property max_runtime_secs¶
- Maximum allowed runtime in seconds for model training. Use 0 to disable. - Type: - float, defaults to- 0.0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_nb = H2ONaiveBayesEstimator(max_runtime_secs=10, ... seed=1234) >>> cars_nb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_nb.auc() 
 - 
property min_prob¶
- Min. probability to use for observations with not enough data - Type: - float, defaults to- 0.001.- Examples
 - >>> import random >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> problem = random.sample(["binomial","multinomial"],1) >>> predictors = ["displacement","power","weight","acceleration","year"] >>> if problem == "binomial": ... response_col = "economy_20mpg" ... else: ... response_col = "cylinders" >>> cars[response_col] = cars[response_col].asfactor() >>> cars_nb = H2ONaiveBayesEstimator(min_prob=0.1, ... eps_prob=0.5, ... seed=1234) >>> cars_nb.train(x=predictors, ... y=response_col, ... training_frame=cars) >>> cars_nb.show() 
 - 
property min_sdev¶
- Min. standard deviation to use for observations with not enough data - Type: - float, defaults to- 0.001.- Examples
 - >>> import random >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> problem = random.sample(["binomial","multinomial"],1) >>> predictors = ["displacement","power","weight","acceleration","year"] >>> if problem == "binomial": ... response_col = "economy_20mpg" ... else: ... response_col = "cylinders" >>> cars[response_col] = cars[response_col].asfactor() >>> cars_nb = H2ONaiveBayesEstimator(min_sdev=0.1, ... eps_sdev=0.5, ... seed=1234) >>> cars_nb.train(x=predictors, ... y=response_col, ... training_frame=cars) >>> cars_nb.show() 
 - 
property nfolds¶
- Number of folds for K-fold cross-validation (0 to disable or >= 2). - Type: - int, defaults to- 0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> cars_nb = H2ONaiveBayesEstimator(nfolds=5, ... seed=1234) >>> cars_nb.train(x=predictors, ... y=response, ... training_frame=cars) >>> cars_nb.auc() 
 - 
property response_column¶
- Response variable column. - Type: - str.
 - 
property score_each_iteration¶
- Whether to score during each iteration of model training. - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_nb = H2ONaiveBayesEstimator(score_each_iteration=True, ... seed=1234) >>> cars_nb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_nb.auc() 
 - 
property seed¶
- Seed for pseudo random number generator (only used for cross-validation and fold_assignment=”Random” or “AUTO”) - Type: - int, defaults to- -1.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> nb_w_seed = H2ONaiveBayesEstimator(seed=1234) >>> nb_w_seed.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> nb_wo_seed = H2ONaiveBayesEstimator() >>> nb_wo_seed.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> nb_w_seed.auc() >>> nb_wo_seed.auc() 
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_nb = H2ONaiveBayesEstimator() >>> cars_nb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_nb.auc() 
 - 
property validation_frame¶
- Id of the validation data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_nb = H2ONaiveBayesEstimator() >>> cars_nb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_nb.auc() 
 
- 
property 
H2OSupportVectorMachineEstimator¶
- 
class h2o.estimators.psvm.H2OSupportVectorMachineEstimator(model_id=None, training_frame=None, validation_frame=None, response_column=None, ignored_columns=None, ignore_const_cols=True, hyper_param=1.0, kernel_type='gaussian', gamma=-1.0, rank_ratio=-1.0, positive_weight=1.0, negative_weight=1.0, disable_training_metrics=True, sv_threshold=0.0001, fact_threshold=1e-05, feasible_threshold=0.001, surrogate_gap_threshold=0.001, mu_factor=10.0, max_iterations=200, seed=-1)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- PSVM - 
property disable_training_metrics¶
- Disable calculating training metrics (expensive on large datasets) - Type: - bool, defaults to- True.- Examples
 - >>> from h2o.estimators import H2OSupportVectorMachineEstimator >>> splice = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/splice/splice.svm") >>> svm = H2OSupportVectorMachineEstimator(gamma=0.01, ... rank_ratio=0.1, ... disable_training_metrics=False) >>> svm.train(y="C1", training_frame=splice) >>> svm.mse() 
 - 
property fact_threshold¶
- Convergence threshold of the Incomplete Cholesky Factorization (ICF) - Type: - float, defaults to- 1e-05.- Examples
 - >>> splice = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/splice/splice.svm") >>> svm = H2OSupportVectorMachineEstimator(disable_training_metrics=False, ... fact_threshold=1e-7) >>> svm.train(y="C1", training_frame=splice) >>> svm.mse() 
 - 
property feasible_threshold¶
- Convergence threshold for primal-dual residuals in the IPM iteration - Type: - float, defaults to- 0.001.- Examples
 - >>> splice = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/splice/splice.svm") >>> svm = H2OSupportVectorMachineEstimator(disable_training_metrics=False, ... fact_threshold=1e-7) >>> svm.train(y="C1", training_frame=splice) >>> svm.mse() 
 - 
property gamma¶
- Coefficient of the kernel (currently RBF gamma for gaussian kernel, -1 means 1/#features) - Type: - float, defaults to- -1.0.- Examples
 - >>> splice = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/splice/splice.svm") >>> svm = H2OSupportVectorMachineEstimator(gamma=0.01, ... rank_ratio=0.1, ... disable_training_metrics=False) >>> svm.train(y="C1", training_frame=splice) >>> svm.mse() 
 - 
property hyper_param¶
- Penalty parameter C of the error term - Type: - float, defaults to- 1.0.- Examples
 - >>> splice = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/splice/splice.svm") >>> svm = H2OSupportVectorMachineEstimator(gamma=0.01, ... rank_ratio=0.1, ... hyper_param=0.01, ... disable_training_metrics=False) >>> svm.train(y="C1", training_frame=splice) >>> svm.mse() 
 - 
property ignore_const_cols¶
- Ignore constant columns. - Type: - bool, defaults to- True.- Examples
 - >>> splice = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/splice/splice.svm") >>> svm = H2OSupportVectorMachineEstimator(gamma=0.01, ... rank_ratio=0.1, ... ignore_const_cols=False, ... disable_training_metrics=False) >>> svm.train(y="C1", training_frame=splice) >>> svm.mse() 
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property kernel_type¶
- Type of used kernel - Type: - Literal["gaussian"], defaults to- "gaussian".- Examples
 - >>> splice = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/splice/splice.svm") >>> svm = H2OSupportVectorMachineEstimator(gamma=0.1, ... rank_ratio=0.1, ... hyper_param=0.01, ... kernel_type="gaussian", ... disable_training_metrics=False) >>> svm.train(y="C1", training_frame=splice) >>> svm.mse() 
 - 
property max_iterations¶
- Maximum number of iteration of the algorithm - Type: - int, defaults to- 200.- Examples
 - >>> splice = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/splice/splice.svm") >>> svm = H2OSupportVectorMachineEstimator(gamma=0.1, ... rank_ratio=0.1, ... hyper_param=0.01, ... max_iterations=20, ... disable_training_metrics=False) >>> svm.train(y="C1", training_frame=splice) >>> svm.mse() 
 - 
property mu_factor¶
- Increasing factor mu - Type: - float, defaults to- 10.0.- Examples
 - >>> splice = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/splice/splice.svm") >>> svm = H2OSupportVectorMachineEstimator(gamma=0.1, ... mu_factor=100.5, ... disable_training_metrics=False) >>> svm.train(y="C1", training_frame=splice) >>> svm.mse() 
 - 
property negative_weight¶
- Weight of positive (-1) class of observations - Type: - float, defaults to- 1.0.- Examples
 - >>> splice = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/splice/splice.svm") >>> svm = H2OSupportVectorMachineEstimator(gamma=0.1, ... rank_ratio=0.1, ... negative_weight=10, ... disable_training_metrics=False) >>> svm.train(y="C1", training_frame=splice) >>> svm.mse() 
 - 
property positive_weight¶
- Weight of positive (+1) class of observations - Type: - float, defaults to- 1.0.- Examples
 - >>> splice = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/splice/splice.svm") >>> svm = H2OSupportVectorMachineEstimator(gamma=0.1, ... rank_ratio=0.1, ... positive_weight=0.1, ... disable_training_metrics=False) >>> svm.train(y="C1", training_frame=splice) >>> svm.mse() 
 - 
property rank_ratio¶
- Desired rank of the ICF matrix expressed as an ration of number of input rows (-1 means use sqrt(#rows)). - Type: - float, defaults to- -1.0.- Examples
 - >>> splice = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/splice/splice.svm") >>> svm = H2OSupportVectorMachineEstimator(gamma=0.01, ... rank_ratio=0.1, ... disable_training_metrics=False) >>> svm.train(y="C1", training_frame=splice) >>> svm.mse() 
 - 
property response_column¶
- Response variable column. - Type: - str.
 - 
property seed¶
- Seed for pseudo random number generator (if applicable) - Type: - int, defaults to- -1.- Examples
 - >>> splice = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/splice/splice.svm") >>> svm = H2OSupportVectorMachineEstimator(gamma=0.1, ... rank_ratio=0.1, ... seed=1234, ... disable_training_metrics=False) >>> svm.train(y="C1", training_frame=splice) >>> svm.model_performance 
 - 
property surrogate_gap_threshold¶
- Feasibility criterion of the surrogate duality gap (eta) - Type: - float, defaults to- 0.001.- Examples
 - >>> splice = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/splice/splice.svm") >>> svm = H2OSupportVectorMachineEstimator(gamma=0.01, ... rank_ratio=0.1, ... surrogate_gap_threshold=0.1, ... disable_training_metrics=False) >>> svm.train(y="C1", training_frame=splice) >>> svm.mse() 
 - 
property sv_threshold¶
- Threshold for accepting a candidate observation into the set of support vectors - Type: - float, defaults to- 0.0001.- Examples
 - >>> splice = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/splice/splice.svm") >>> svm = H2OSupportVectorMachineEstimator(gamma=0.01, ... rank_ratio=0.1, ... sv_threshold=0.01, ... disable_training_metrics=False) >>> svm.train(y="C1", training_frame=splice) >>> svm.mse() 
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> splice = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/splice/splice.svm") >>> train, valid = splice.split_frame(ratios=[0.8]) >>> svm = H2OSupportVectorMachineEstimator(disable_training_metrics=False) >>> svm.train(y="C1", training_frame=train) >>> svm.mse() 
 - 
property validation_frame¶
- Id of the validation data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> splice = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/splice/splice.svm") >>> train, valid = splice.split_frame(ratios=[0.8]) >>> svm = H2OSupportVectorMachineEstimator(disable_training_metrics=False) >>> svm.train(y="C1", training_frame=train, validation_frame=valid) >>> svm.mse() 
 
- 
property 
H2ORandomForestEstimator¶
- 
class h2o.estimators.random_forest.H2ORandomForestEstimator(model_id=None, training_frame=None, validation_frame=None, nfolds=0, keep_cross_validation_models=True, keep_cross_validation_predictions=False, keep_cross_validation_fold_assignment=False, score_each_iteration=False, score_tree_interval=0, fold_assignment='auto', fold_column=None, response_column=None, ignored_columns=None, ignore_const_cols=True, weights_column=None, balance_classes=False, class_sampling_factors=None, max_after_balance_size=5.0, max_confusion_matrix_size=20, ntrees=50, max_depth=20, min_rows=1.0, nbins=20, nbins_top_level=1024, nbins_cats=1024, r2_stopping=None, stopping_rounds=0, stopping_metric='auto', stopping_tolerance=0.001, max_runtime_secs=0.0, seed=-1, build_tree_one_node=False, mtries=-1, sample_rate=0.632, sample_rate_per_class=None, binomial_double_trees=False, checkpoint=None, col_sample_rate_change_per_level=1.0, col_sample_rate_per_tree=1.0, min_split_improvement=1e-05, histogram_type='auto', categorical_encoding='auto', calibrate_model=False, calibration_frame=None, calibration_method='auto', distribution='auto', custom_metric_func=None, export_checkpoints_dir=None, check_constant_response=True, gainslift_bins=-1, auc_type='auto')[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- Distributed Random Forest - Builds a Distributed Random Forest (DRF) on a parsed dataset, for regression or classification. - 
property auc_type¶
- Set default multinomial AUC type. - Type: - Literal["auto", "none", "macro_ovr", "weighted_ovr", "macro_ovo", "weighted_ovo"], defaults to- "auto".
 - 
property balance_classes¶
- Balance training data class counts via over/under-sampling (for imbalanced data). - Type: - bool, defaults to- False.- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> train, valid = covtype.split_frame(ratios=[.8], seed=1234) >>> cov_drf = H2ORandomForestEstimator(balance_classes=True, ... seed=1234) >>> cov_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print('logloss', cov_drf.logloss(valid=True)) 
 - 
property binomial_double_trees¶
- For binary classification: Build 2x as many trees (one per class) - can lead to higher accuracy. - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_drf = H2ORandomForestEstimator(binomial_double_trees=False, ... seed=1234) >>> cars_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print('without binomial_double_trees:', ... cars_drf.auc(valid=True)) >>> cars_drf_2 = H2ORandomForestEstimator(binomial_double_trees=True, ... seed=1234) >>> cars_drf_2.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print('with binomial_double_trees:', cars_drf_2.auc(valid=True)) 
 - 
property build_tree_one_node¶
- Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets. - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_drf = H2ORandomForestEstimator(build_tree_one_node=True, ... seed=1234) >>> cars_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_drf.auc(valid=True) 
 - 
property calibrate_model¶
- Use Platt Scaling (default) or Isotonic Regression to calculate calibrated class probabilities. Calibration can provide more accurate estimates of class probabilities. - Type: - bool, defaults to- False.- Examples
 - >>> ecology = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/ecology_model.csv") >>> ecology['Angaus'] = ecology['Angaus'].asfactor() >>> from h2o.estimators.random_forest import H2ORandomForestEstimator >>> response = 'Angaus' >>> predictors = ecology.columns[3:13] >>> train, calib = ecology.split_frame(seed=12354) >>> w = h2o.create_frame(binary_fraction=1, ... binary_ones_fraction=0.5, ... missing_fraction=0, ... rows=744, cols=1) >>> w.set_names(["weight"]) >>> train = train.cbind(w) >>> ecology_drf = H2ORandomForestEstimator(ntrees=10, ... max_depth=5, ... min_rows=10, ... distribution="multinomial", ... weights_column="weight", ... calibrate_model=True, ... calibration_frame=calib) >>> ecology_drf.train(x=predictors, ... y="Angaus", ... training_frame=train) >>> predicted = ecology_drf.predict(calib) 
 - 
property calibration_frame¶
- Data for model calibration - Type: - Union[None, str, H2OFrame].- Examples
 - >>> ecology = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/ecology_model.csv") >>> ecology['Angaus'] = ecology['Angaus'].asfactor() >>> response = 'Angaus' >>> predictors = ecology.columns[3:13] >>> train, calib = ecology.split_frame(seed = 12354) >>> w = h2o.create_frame(binary_fraction=1, ... binary_ones_fraction=0.5, ... missing_fraction=0, ... rows=744, cols=1) >>> w.set_names(["weight"]) >>> train = train.cbind(w) >>> ecology_drf = H2ORandomForestEstimator(ntrees=10, ... max_depth=5, ... min_rows=10, ... distribution="multinomial", ... calibrate_model=True, ... calibration_frame=calib) >>> ecology_drf.train(x=predictors, ... y="Angaus, ... training_frame=train, ... weights_column="weight") >>> predicted = ecology_drf.predict(train) 
 - 
property calibration_method¶
- Calibration method to use - Type: - Literal["auto", "platt_scaling", "isotonic_regression"], defaults to- "auto".
 - 
property categorical_encoding¶
- Encoding scheme for categorical features - Type: - Literal["auto", "enum", "one_hot_internal", "one_hot_explicit", "binary", "eigen", "label_encoder", "sort_by_response", "enum_limited"], defaults to- "auto".- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> encoding = "one_hot_explicit" >>> airlines_drf = H2ORandomForestEstimator(categorical_encoding=encoding, ... seed=1234) >>> airlines_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_drf.auc(valid=True) 
 - 
property check_constant_response¶
- Check if response column is constant. If enabled, then an exception is thrown if the response column is a constant value.If disabled, then model will train regardless of the response column being a constant value or not. - Type: - bool, defaults to- True.- Examples
 - >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_train.csv") >>> train["constantCol"] = 1 >>> my_drf = H2ORandomForestEstimator(check_constant_response=False) >>> my_drf.train(x=list(range(1,5)), ... y="constantCol", ... training_frame=train) 
 - 
property checkpoint¶
- Model checkpoint to resume training with. - Type: - Union[None, str, H2OEstimator].- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], ... seed=1234) >>> cars_drf = H2ORandomForestEstimator(ntrees=1, ... seed=1234) >>> cars_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(cars_drf.auc(valid=True)) 
 - 
property class_sampling_factors¶
- Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires balance_classes. - Type: - List[float].- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> train, valid = covtype.split_frame(ratios=[.8], seed=1234) >>> print(covtype[54].table()) >>> sample_factors = [1., 0.5, 1., 1., 1., 1., 1.] >>> cov_drf = H2ORandomForestEstimator(balance_classes=True, ... class_sampling_factors=sample_factors, ... seed=1234) >>> cov_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print('logloss', cov_drf.logloss(valid=True)) 
 - 
property col_sample_rate_change_per_level¶
- Relative change of the column sampling rate for every level (must be > 0.0 and <= 2.0) - Type: - float, defaults to- 1.0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_drf = H2ORandomForestEstimator(col_sample_rate_change_per_level=.9, ... seed=1234) >>> airlines_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(airlines_drf.auc(valid=True)) 
 - 
property col_sample_rate_per_tree¶
- Column sample rate per tree (from 0.0 to 1.0) - Type: - float, defaults to- 1.0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_drf = H2ORandomForestEstimator(col_sample_rate_per_tree=.7, ... seed=1234) >>> airlines_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(airlines_drf.auc(valid=True)) 
 - 
property custom_metric_func¶
- Reference to custom evaluation function, format: language:keyName=funcName - Type: - str.
 - 
property distribution¶
- Distribution function - Type: - Literal["auto", "bernoulli", "multinomial", "gaussian", "poisson", "gamma", "tweedie", "laplace", "quantile", "huber"], defaults to- "auto".- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_drf = H2ORandomForestEstimator(distribution="poisson", ... seed=1234) >>> cars_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_drf.mse(valid=True) 
 - 
property export_checkpoints_dir¶
- Automatically export generated models to this directory. - Type: - str.- Examples
 - >>> import tempfile >>> from os import listdir >>> from h2o.grid.grid_search import H2OGridSearch >>> airlines = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip", destination_frame="air.hex") >>> predictors = ["DayofMonth", "DayOfWeek"] >>> response = "IsDepDelayed" >>> hyper_parameters = {'ntrees': [5,10]} >>> search_crit = {'strategy': "RandomDiscrete", ... 'max_models': 5, ... 'seed': 1234, ... 'stopping_rounds': 3, ... 'stopping_metric': "AUTO", ... 'stopping_tolerance': 1e-2} >>> checkpoints_dir = tempfile.mkdtemp() >>> air_grid = H2OGridSearch(H2ORandomForestEstimator, ... hyper_params=hyper_parameters, ... search_criteria=search_crit) >>> air_grid.train(x=predictors, ... y=response, ... training_frame=airlines, ... distribution="bernoulli", ... max_depth=3, ... export_checkpoints_dir=checkpoints_dir) >>> num_files = len(listdir(checkpoints_dir)) >>> num_files 
 - 
property fold_assignment¶
- Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems. - Type: - Literal["auto", "random", "modulo", "stratified"], defaults to- "auto".- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> assignment_type = "Random" >>> cars_drf = H2ORandomForestEstimator(fold_assignment=assignment_type, ... nfolds=5, ... seed=1234) >>> cars_drf.train(x=predictors, ... y=response, ... training_frame=cars) >>> cars_drf.auc(xval=True) 
 - 
property fold_column¶
- Column with cross-validation fold index assignment per observation. - Type: - str.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> fold_numbers = cars.kfold_column(n_folds=5, seed=1234) >>> fold_numbers.set_names(["fold_numbers"]) >>> cars = cars.cbind(fold_numbers) >>> print(cars['fold_numbers']) >>> cars_drf = H2ORandomForestEstimator(seed=1234) >>> cars_drf.train(x=predictors, ... y=response, ... training_frame=cars, ... fold_column="fold_numbers") >>> cars_drf.auc(xval=True) 
 - 
property gainslift_bins¶
- Gains/Lift table number of bins. 0 means disabled.. Default value -1 means automatic binning. - Type: - int, defaults to- -1.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/airlines_train.csv") >>> model = H2ORandomForestEstimator(ntrees=1, gainslift_bins=20) >>> model.train(x=["Origin", "Distance"], ... y="IsDepDelayed", ... training_frame=airlines) >>> model.gains_lift() 
 - 
property histogram_type¶
- What type of histogram to use for finding optimal split points - Type: - Literal["auto", "uniform_adaptive", "random", "quantiles_global", "round_robin", "uniform_robust"], defaults to- "auto".- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_drf = H2ORandomForestEstimator(histogram_type="UniformAdaptive", ... seed=1234) >>> airlines_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(airlines_drf.auc(valid=True)) 
 - 
property ignore_const_cols¶
- Ignore constant columns. - Type: - bool, defaults to- True.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> cars["const_1"] = 6 >>> cars["const_2"] = 7 >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_drf = H2ORandomForestEstimator(seed=1234, ... ignore_const_cols=True) >>> cars_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_drf.auc(valid=True) 
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property keep_cross_validation_fold_assignment¶
- Whether to keep the cross-validation fold assignment. - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_drf = H2ORandomForestEstimator(keep_cross_validation_fold_assignment=True, ... nfolds=5, ... seed=1234) >>> cars_drf.train(x=predictors, ... y=response, ... training_frame=train) >>> cars_drf.cross_validation_fold_assignment() 
 - 
property keep_cross_validation_models¶
- Whether to keep the cross-validation models. - Type: - bool, defaults to- True.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_drf = H2ORandomForestEstimator(keep_cross_validation_models=True, ... nfolds=5, ... seed=1234) >>> cars_drf.train(x=predictors, ... y=response, ... training_frame=train) >>> cars_drf.auc() 
 - 
property keep_cross_validation_predictions¶
- Whether to keep the predictions of the cross-validation models. - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_drf = H2ORandomForestEstimator(keep_cross_validation_predictions=True, ... nfolds=5, ... seed=1234) >>> cars_drf.train(x=predictors, ... y=response, ... training_frame=train) >>> cars_drf.cross_validation_predictions() 
 - 
property max_after_balance_size¶
- Maximum relative size of the training data after balancing class counts (can be less than 1.0). Requires balance_classes. - Type: - float, defaults to- 5.0.- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> train, valid = covtype.split_frame(ratios=[.8], seed=1234) >>> print(covtype[54].table()) >>> max = .85 >>> cov_drf = H2ORandomForestEstimator(balance_classes=True, ... max_after_balance_size=max, ... seed=1234) >>> cov_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print('logloss', cov_drf.logloss(valid=True)) 
 - 
property max_confusion_matrix_size¶
- [Deprecated] Maximum size (# classes) for confusion matrices to be printed in the Logs - Type: - int, defaults to- 20.
 - 
property max_depth¶
- Maximum tree depth (0 for unlimited). - Type: - int, defaults to- 20.- Examples
 - >>> df = h2o.import_file(path = "http://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> response = "survived" >>> df[response] = df[response].asfactor() >>> predictors = df.columns >>> del predictors[1:3] >>> train, valid, test = df.split_frame(ratios=[0.6,0.2], ... seed=1234, ... destination_frames= ... ['train.hex','valid.hex','test.hex']) >>> drf = H2ORandomForestEstimator() >>> drf.train(x=predictors, ... y=response, ... training_frame=train) >>> perf = drf.model_performance(valid) >>> print(perf.auc()) 
 - 
property max_runtime_secs¶
- Maximum allowed runtime in seconds for model training. Use 0 to disable. - Type: - float, defaults to- 0.0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_drf = H2ORandomForestEstimator(max_runtime_secs=10, ... ntrees=10000, ... max_depth=10, ... seed=1234) >>> cars_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_drf.auc(valid = True) 
 - 
property min_rows¶
- Fewest allowed (weighted) observations in a leaf. - Type: - float, defaults to- 1.0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_drf = H2ORandomForestEstimator(min_rows=16, ... seed=1234) >>> cars_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(cars_drf.auc(valid=True)) 
 - 
property min_split_improvement¶
- Minimum relative improvement in squared error reduction for a split to happen - Type: - float, defaults to- 1e-05.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_drf = H2ORandomForestEstimator(min_split_improvement=1e-3, ... seed=1234) >>> cars_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(cars_drf.auc(valid=True)) 
 - 
property mtries¶
- Number of variables randomly sampled as candidates at each split. If set to -1, defaults to sqrt{p} for classification and p/3 for regression (where p is the # of predictors - Type: - int, defaults to- -1.- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> train, valid = covtype.split_frame(ratios=[.8], seed=1234) >>> cov_drf = H2ORandomForestEstimator(mtries=30, seed=1234) >>> cov_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print('logloss', cov_drf.logloss(valid=True)) 
 - 
property nbins¶
- For numerical columns (real/int), build a histogram of (at least) this many bins, then split at the best point - Type: - int, defaults to- 20.- Examples
 - >>> eeg = h2o.import_file("https://h2o-public-test-data.s3.amazonaws.com/smalldata/eeg/eeg_eyestate.csv") >>> eeg['eyeDetection'] = eeg['eyeDetection'].asfactor() >>> predictors = eeg.columns[:-1] >>> response = 'eyeDetection' >>> train, valid = eeg.split_frame(ratios=[.8], seed=1234) >>> bin_num = [16, 32, 64, 128, 256, 512] >>> label = ["16", "32", "64", "128", "256", "512"] >>> for key, num in enumerate(bin_num): # Insert integer for 'num' and 'key' >>> eeg_drf = H2ORandomForestEstimator(nbins=num, seed=1234) >>> eeg_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(label[key], 'training score', ... eeg_drf.auc(train=True)) >>> print(label[key], 'validation score', ... eeg_drf.auc(train=True)) 
 - 
property nbins_cats¶
- For categorical columns (factors), build a histogram of this many bins, then split at the best point. Higher values can lead to more overfitting. - Type: - int, defaults to- 1024.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> bin_num = [8, 16, 32, 64, 128, 256, ... 512, 1024, 2048, 4096] >>> label = ["8", "16", "32", "64", "128", ... "256", "512", "1024", "2048", "4096"] >>> for key, num in enumerate(bin_num): # Insert integer for 'num' and 'key' >>> airlines_drf = H2ORandomForestEstimator(nbins_cats=num, ... seed=1234) >>> airlines_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(label[key], 'training score', ... airlines_gbm.auc(train=True)) >>> print(label[key], 'validation score', ... airlines_gbm.auc(valid=True)) 
 - 
property nbins_top_level¶
- For numerical columns (real/int), build a histogram of (at most) this many bins at the root level, then decrease by factor of two per level - Type: - int, defaults to- 1024.- Examples
 - >>> eeg = h2o.import_file("https://h2o-public-test-data.s3.amazonaws.com/smalldata/eeg/eeg_eyestate.csv") >>> eeg['eyeDetection'] = eeg['eyeDetection'].asfactor() >>> predictors = eeg.columns[:-1] >>> response = 'eyeDetection' >>> train, valid = eeg.split_frame(ratios=[.8], ... seed=1234) >>> bin_num = [32, 64, 128, 256, 512, ... 1024, 2048, 4096] >>> label = ["32", "64", "128", "256", ... "512", "1024", "2048", "4096"] >>> for key, num in enumerate(bin_num): # Insert integer for 'num' and 'key' >>> eeg_drf = H2ORandomForestEstimator(nbins_top_level=32, ... seed=1234) >>> eeg_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(label[key], 'training score', ... eeg_gbm.auc(train=True)) >>> print(label[key], 'validation score', ... eeg_gbm.auc(valid=True)) 
 - 
property nfolds¶
- Number of folds for K-fold cross-validation (0 to disable or >= 2). - Type: - int, defaults to- 0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> folds = 5 >>> cars_drf = H2ORandomForestEstimator(nfolds=folds, ... seed=1234) >>> cars_drf.train(x=predictors, ... y=response, ... training_frame=cars) >>> cars_drf.auc(xval=True) 
 - 
property ntrees¶
- Number of trees. - Type: - int, defaults to- 50.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> del predictors[1:3] >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], ... seed=1234) >>> tree_num = [20, 50, 80, 110, ... 140, 170, 200] >>> label = ["20", "50", "80", "110", ... "140", "170", "200"] >>> for key, num in enumerate(tree_num): # Input an integer for 'num' and 'key' >>> titanic_drf = H2ORandomForestEstimator(ntrees=num, ... seed=1234) >>> titanic_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(label[key], 'training score', ... titanic_drf.auc(train=True)) >>> print(label[key], 'validation score', ... titanic_drf.auc(valid=True)) 
 - 
property offset_column¶
- [Deprecated] The property was removed and will be ignored. 
 - 
property r2_stopping¶
- r2_stopping is no longer supported and will be ignored if set - please use stopping_rounds, stopping_metric and stopping_tolerance instead. Previous version of H2O would stop making trees when the R^2 metric equals or exceeds this - Type: - float, defaults to- ∞.
 - 
property response_column¶
- Response variable column. - Type: - str.
 - 
property sample_rate¶
- Row sample rate per tree (from 0.0 to 1.0) - Type: - float, defaults to- 0.632.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], ... seed=1234) >>> airlines_drf = H2ORandomForestEstimator(sample_rate=.7, ... seed=1234) >>> airlines_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(airlines_drf.auc(valid=True)) 
 - 
property sample_rate_per_class¶
- A list of row sample rates per class (relative fraction for each class, from 0.0 to 1.0), for each tree - Type: - List[float].- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> train, valid = covtype.split_frame(ratios=[.8], ... seed=1234) >>> print(train[response].table()) >>> rate_per_class_list = [1, .4, 1, 1, 1, 1, 1] >>> cov_drf = H2ORandomForestEstimator(sample_rate_per_class=rate_per_class_list, ... seed=1234) >>> cov_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print('logloss', cov_drf.logloss(valid=True)) 
 - 
property score_each_iteration¶
- Whether to score during each iteration of model training. - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_drf = H2ORandomForestEstimator(score_each_iteration=True, ... ntrees=55, ... seed=1234) >>> cars_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame = valid) >>> cars_drf.scoring_history() 
 - 
property score_tree_interval¶
- Score the model after every so many trees. Disabled if set to 0. - Type: - int, defaults to- 0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_drf = H2ORandomForestEstimator(score_tree_interval=5, ... seed=1234) >>> cars_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_drf.scoring_history() 
 - 
property seed¶
- Seed for pseudo random number generator (if applicable) - Type: - int, defaults to- -1.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> drf_w_seed_1 = H2ORandomForestEstimator(seed=1234) >>> drf_w_seed_1.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print('auc for the 1st model build with a seed:', ... drf_w_seed_1.auc(valid=True)) 
 - 
property stopping_metric¶
- Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. - Type: - Literal["auto", "deviance", "logloss", "mse", "rmse", "mae", "rmsle", "auc", "aucpr", "lift_top_group", "misclassification", "mean_per_class_error", "custom", "custom_increasing"], defaults to- "auto".- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], ... seed=1234) >>> airlines_drf = H2ORandomForestEstimator(stopping_metric="auc", ... stopping_rounds=3, ... stopping_tolerance=1e-2, ... seed=1234) >>> airlines_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_drf.auc(valid=True) 
 - 
property stopping_rounds¶
- Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) - Type: - int, defaults to- 0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], ... seed=1234) >>> airlines_drf = H2ORandomForestEstimator(stopping_metric="auc", ... stopping_rounds=3, ... stopping_tolerance=1e-2, ... seed=1234) >>> airlines_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_drf.auc(valid=True) 
 - 
property stopping_tolerance¶
- Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much) - Type: - float, defaults to- 0.001.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], ... seed=1234) >>> airlines_drf = H2ORandomForestEstimator(stopping_metric="auc", ... stopping_rounds=3, ... stopping_tolerance=1e-2, ... seed=1234) >>> airlines_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_drf.auc(valid=True) 
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], ... seed=1234) >>> cars_drf = H2ORandomForestEstimator(seed=1234) >>> cars_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_drf.auc(valid=True) 
 - 
property validation_frame¶
- Id of the validation data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], ... seed=1234) >>> cars_drf = H2ORandomForestEstimator(seed=1234) >>> cars_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_drf.auc(valid=True) 
 - 
property weights_column¶
- Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0. - Type: - str.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","acceleration","year"] >>> response = "economy_20mpg" >>> train, valid = cars.split_frame(ratios=[.8], ... seed=1234) >>> cars_drf = H2ORandomForestEstimator(seed=1234) >>> cars_drf.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid, ... weights_column="weight") >>> cars_drf.auc(valid=True) 
 
- 
property 
H2ORuleFitEstimator¶
- 
class h2o.estimators.rulefit.H2ORuleFitEstimator(model_id=None, training_frame=None, validation_frame=None, seed=-1, response_column=None, ignored_columns=None, algorithm='auto', min_rule_length=3, max_rule_length=3, max_num_rules=-1, model_type='rules_and_linear', weights_column=None, distribution='auto', rule_generation_ntrees=50, auc_type='auto', remove_duplicates=True, lambda_=None, max_categorical_levels=10)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- RuleFit - Builds a RuleFit on a parsed dataset, for regression or classification. - 
property Lambda¶
- [Deprecated] Use - lambda_instead
 - 
property algorithm¶
- The algorithm to use to generate rules. - Type: - Literal["auto", "drf", "gbm"], defaults to- "auto".
 - 
property auc_type¶
- Set default multinomial AUC type. - Type: - Literal["auto", "none", "macro_ovr", "weighted_ovr", "macro_ovo", "weighted_ovo"], defaults to- "auto".
 - 
property distribution¶
- Distribution function - Type: - Literal["auto", "bernoulli", "multinomial", "gaussian", "poisson", "gamma", "tweedie", "laplace", "quantile", "huber"], defaults to- "auto".
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property lambda_¶
- Lambda for LASSO regressor. - Type: - List[float].
 - 
property max_categorical_levels¶
- For every categorical feature, only use this many most frequent categorical levels for model training. Only used for categorical_encoding == EnumLimited. - Type: - int, defaults to- 10.
 - 
property max_num_rules¶
- The maximum number of rules to return. defaults to -1 which means the number of rules is selected by diminishing returns in model deviance. - Type: - int, defaults to- -1.
 - 
property max_rule_length¶
- Maximum length of rules. Defaults to 3. - Type: - int, defaults to- 3.
 - 
property min_rule_length¶
- Minimum length of rules. Defaults to 3. - Type: - int, defaults to- 3.
 - 
property model_type¶
- Specifies type of base learners in the ensemble. - Type: - Literal["rules_and_linear", "rules", "linear"], defaults to- "rules_and_linear".
 - 
predict_rules(frame, rule_ids)[source]¶
- Evaluates validity of the given rules on the given data. - Parameters
- frame – H2OFrame on which rule validity is to be evaluated 
- rule_ids – string array of rule ids to be evaluated against the frame 
 
- Returns
- H2OFrame with a column per each input ruleId, representing a flag whether given rule is applied to the observation or not. 
 
 - 
property remove_duplicates¶
- Whether to remove rules which are identical to an earlier rule. Defaults to true. - Type: - bool, defaults to- True.
 - 
property response_column¶
- Response variable column. - Type: - str.
 - 
property rule_generation_ntrees¶
- Specifies the number of trees to build in the tree model. Defaults to 50. - Type: - int, defaults to- 50.
 - 
property seed¶
- Seed for pseudo random number generator (if applicable). - Type: - int, defaults to- -1.
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].
 - 
property validation_frame¶
- Id of the validation data frame. - Type: - Union[None, str, H2OFrame].
 - 
property weights_column¶
- Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0. - Type: - str.
 
- 
property 
H2OStackedEnsembleEstimator¶
- 
class h2o.estimators.stackedensemble.H2OStackedEnsembleEstimator(model_id=None, training_frame=None, response_column=None, validation_frame=None, blending_frame=None, base_models=[], metalearner_algorithm='auto', metalearner_nfolds=0, metalearner_fold_assignment=None, metalearner_fold_column=None, metalearner_params=None, metalearner_transform='none', max_runtime_secs=0.0, weights_column=None, offset_column=None, custom_metric_func=None, seed=-1, score_training_samples=10000, keep_levelone_frame=False, export_checkpoints_dir=None, auc_type='auto', gainslift_bins=-1)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- Stacked Ensemble - Builds a stacked ensemble (aka “super learner”) machine learning method that uses two or more H2O learning algorithms to improve predictive performance. It is a loss-based supervised learning method that finds the optimal combination of a collection of prediction algorithms.This method supports regression and binary classification. - Examples
 - >>> import h2o >>> h2o.init() >>> from h2o.estimators.random_forest import H2ORandomForestEstimator >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator >>> col_types = ["numeric", "numeric", "numeric", "enum", ... "enum", "numeric", "numeric", "numeric", "numeric"] >>> data = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv", col_types=col_types) >>> train, test = data.split_frame(ratios=[.8], seed=1) >>> x = ["CAPSULE","GLEASON","RACE","DPROS","DCAPS","PSA","VOL"] >>> y = "AGE" >>> nfolds = 5 >>> gbm = H2OGradientBoostingEstimator(nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True) >>> gbm.train(x=x, y=y, training_frame=train) >>> rf = H2ORandomForestEstimator(nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True) >>> rf.train(x=x, y=y, training_frame=train) >>> stack = H2OStackedEnsembleEstimator(model_id="ensemble", ... training_frame=train, ... validation_frame=test, ... base_models=[gbm.model_id, rf.model_id]) >>> stack.train(x=x, y=y, training_frame=train, validation_frame=test) >>> stack.model_performance() - 
property auc_type¶
- Set default multinomial AUC type. - Type: - Literal["auto", "none", "macro_ovr", "weighted_ovr", "macro_ovo", "weighted_ovo"], defaults to- "auto".
 - 
property base_models¶
- List of models or grids (or their ids) to ensemble/stack together. Grids are expanded to individual models. If not using blending frame, then models must have been cross-validated using nfolds > 1, and folds must be identical across models. - Type: - List[str], defaults to- [].- Examples
 - >>> from h2o.estimators.random_forest import H2ORandomForestEstimator >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator >>> col_types = ["numeric", "numeric", "numeric", "enum", ... "enum", "numeric", "numeric", "numeric", "numeric"] >>> data = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv", col_types=col_types) >>> train, test = data.split_frame(ratios=[.8], seed=1) >>> x = ["CAPSULE","GLEASON","RACE","DPROS","DCAPS","PSA","VOL"] >>> y = "AGE" >>> nfolds = 5 >>> gbm = H2OGradientBoostingEstimator(nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True) >>> gbm.train(x=x, y=y, training_frame=train) >>> rf = H2ORandomForestEstimator(nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True) >>> rf.train(x=x, y=y, training_frame=train) >>> stack = H2OStackedEnsembleEstimator(model_id="ensemble", ... training_frame=train, ... validation_frame=test, ... base_models=[gbm.model_id, rf.model_id]) >>> stack.train(x=x, y=y, training_frame=train, validation_frame=test) >>> stack.model_performance() 
 - 
property blending_frame¶
- Frame used to compute the predictions that serve as the training frame for the metalearner (triggers blending mode if provided) - Type: - Union[None, str, H2OFrame].- Examples
 - >>> from h2o.estimators.random_forest import H2ORandomForestEstimator >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator >>> higgs = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_train_5k.csv") >>> train, blend = higgs.split_frame(ratios = [.8], seed = 1234) >>> x = train.columns >>> y = "response" >>> x.remove(y) >>> train[y] = train[y].asfactor() >>> blend[y] = blend[y].asfactor() >>> nfolds = 3 >>> my_gbm = H2OGradientBoostingEstimator(distribution="bernoulli", ... ntrees=10, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_gbm.train(x=x, y=y, training_frame=train) >>> my_rf = H2ORandomForestEstimator(ntrees=50, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_rf.train(x=x, y=y, training_frame=train) >>> stack_blend = H2OStackedEnsembleEstimator(base_models=[my_gbm, my_rf], ... seed=1) >>> stack_blend.train(x=x, y=y, training_frame=train, blending_frame=blend) >>> stack_blend.model_performance(blend).auc() 
 - 
property custom_metric_func¶
- Reference to custom evaluation function, format: language:keyName=funcName - Type: - str.
 - 
property export_checkpoints_dir¶
- Automatically export generated models to this directory. - Type: - str.- Examples
 - >>> from h2o.estimators.random_forest import H2ORandomForestEstimator >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator >>> import tempfile >>> from os import listdir >>> higgs = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_train_5k.csv") >>> train, blend = higgs.split_frame(ratios = [.8], seed = 1234) >>> x = train.columns >>> y = "response" >>> x.remove(y) >>> train[y] = train[y].asfactor() >>> blend[y] = blend[y].asfactor() >>> nfolds = 3 >>> checkpoints_dir = tempfile.mkdtemp() >>> my_gbm = H2OGradientBoostingEstimator(distribution="bernoulli", ... ntrees=10, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_gbm.train(x=x, y=y, training_frame=train) >>> my_rf = H2ORandomForestEstimator(ntrees=50, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_rf.train(x=x, y=y, training_frame=train) >>> stack_blend = H2OStackedEnsembleEstimator(base_models=[my_gbm, my_rf], ... seed=1, ... export_checkpoints_dir=checkpoints_dir) >>> stack_blend.train(x=x, y=y, training_frame=train, blending_frame=blend) >>> len(listdir(checkpoints_dir)) 
 - 
property gainslift_bins¶
- Gains/Lift table number of bins. 0 means disabled.. Default value -1 means automatic binning. - Type: - int, defaults to- -1.
 - 
property keep_levelone_frame¶
- Keep level one frame used for metalearner training. - Type: - bool, defaults to- False.- Examples
 - >>> from h2o.estimators.random_forest import H2ORandomForestEstimator >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator >>> higgs = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_train_5k.csv") >>> train, blend = higgs.split_frame(ratios = [.8], seed = 1234) >>> x = train.columns >>> y = "response" >>> x.remove(y) >>> train[y] = train[y].asfactor() >>> blend[y] = blend[y].asfactor() >>> nfolds = 3 >>> my_gbm = H2OGradientBoostingEstimator(distribution="bernoulli", ... ntrees=1, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_gbm.train(x=x, y=y, training_frame=train) >>> my_rf = H2ORandomForestEstimator(ntrees=50, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_rf.train(x=x, y=y, training_frame=train) >>> stack_blend = H2OStackedEnsembleEstimator(base_models=[my_gbm, my_rf], ... seed=1, ... keep_levelone_frame=True) >>> stack_blend.train(x=x, y=y, training_frame=train, blending_frame=blend) >>> stack_blend.model_performance(blend).auc() 
 - 
levelone_frame_id()[source]¶
- Fetch the levelone_frame_id for an H2OStackedEnsembleEstimator. - Examples
 - >>> from h2o.estimators.random_forest import H2ORandomForestEstimator >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator >>> higgs = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_train_5k.csv") >>> train, blend = higgs.split_frame(ratios = [.8], seed = 1234) >>> x = train.columns >>> y = "response" >>> x.remove(y) >>> train[y] = train[y].asfactor() >>> blend[y] = blend[y].asfactor() >>> nfolds = 3 >>> my_gbm = H2OGradientBoostingEstimator(distribution="bernoulli", ... ntrees=10, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_gbm.train(x=x, y=y, training_frame=train) >>> my_rf = H2ORandomForestEstimator(ntrees=50, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_rf.train(x=x, y=y, training_frame=train) >>> stack_blend = H2OStackedEnsembleEstimator(base_models=[my_gbm, my_rf], ... seed=1, ... keep_levelone_frame=True) >>> stack_blend.train(x=x, y=y, training_frame=train, blending_frame=blend) >>> stack_blend.levelone_frame_id() 
 - 
property max_runtime_secs¶
- Maximum allowed runtime in seconds for model training. Use 0 to disable. - Type: - float, defaults to- 0.0.
 - 
metalearner()[source]¶
- Print the metalearner of an H2OStackedEnsembleEstimator. - Examples
 - >>> from h2o.estimators.random_forest import H2ORandomForestEstimator >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator >>> higgs = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_train_5k.csv") >>> train, blend = higgs.split_frame(ratios = [.8], seed = 1234) >>> x = train.columns >>> y = "response" >>> x.remove(y) >>> train[y] = train[y].asfactor() >>> blend[y] = blend[y].asfactor() >>> nfolds = 3 >>> my_gbm = H2OGradientBoostingEstimator(distribution="bernoulli", ... ntrees=10, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_gbm.train(x=x, y=y, training_frame=train) >>> my_rf = H2ORandomForestEstimator(ntrees=50, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_rf.train(x=x, y=y, training_frame=train) >>> stack_blend = H2OStackedEnsembleEstimator(base_models=[my_gbm, my_rf], ... seed=1, ... keep_levelone_frame=True) >>> stack_blend.train(x=x, y=y, training_frame=train, blending_frame=blend) >>> stack_blend.metalearner() 
 - 
property metalearner_algorithm¶
- Type of algorithm to use as the metalearner. Options include ‘AUTO’ (GLM with non negative weights; if validation_frame is present, a lambda search is performed), ‘deeplearning’ (Deep Learning with default parameters), ‘drf’ (Random Forest with default parameters), ‘gbm’ (GBM with default parameters), ‘glm’ (GLM with default parameters), ‘naivebayes’ (NaiveBayes with default parameters), or ‘xgboost’ (if available, XGBoost with default parameters). - Type: - Literal["auto", "deeplearning", "drf", "gbm", "glm", "naivebayes", "xgboost"], defaults to- "auto".- Examples
 - >>> from h2o.estimators.random_forest import H2ORandomForestEstimator >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator >>> higgs = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_train_5k.csv") >>> train, blend = higgs.split_frame(ratios = [.8], seed = 1234) >>> x = train.columns >>> y = "response" >>> x.remove(y) >>> train[y] = train[y].asfactor() >>> blend[y] = blend[y].asfactor() >>> nfolds = 3 >>> my_gbm = H2OGradientBoostingEstimator(distribution="bernoulli", ... ntrees=1, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_gbm.train(x=x, y=y, training_frame=train) >>> my_rf = H2ORandomForestEstimator(ntrees=50, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_rf.train(x=x, y=y, training_frame=train) >>> stack_blend = H2OStackedEnsembleEstimator(base_models=[my_gbm, my_rf], ... seed=1, ... metalearner_algorithm="gbm") >>> stack_blend.train(x=x, y=y, training_frame=train, blending_frame=blend) >>> stack_blend.model_performance(blend).auc() 
 - 
property metalearner_fold_assignment¶
- Cross-validation fold assignment scheme for metalearner cross-validation. Defaults to AUTO (which is currently set to Random). The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems. - Type: - Literal["auto", "random", "modulo", "stratified"].- Examples
 - >>> from h2o.estimators.random_forest import H2ORandomForestEstimator >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator >>> higgs = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_train_5k.csv") >>> train, blend = higgs.split_frame(ratios = [.8], seed = 1234) >>> x = train.columns >>> y = "response" >>> x.remove(y) >>> train[y] = train[y].asfactor() >>> blend[y] = blend[y].asfactor() >>> nfolds = 3 >>> my_gbm = H2OGradientBoostingEstimator(distribution="bernoulli", ... ntrees=1, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_gbm.train(x=x, y=y, training_frame=train) >>> my_rf = H2ORandomForestEstimator(ntrees=50, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_rf.train(x=x, y=y, training_frame=train) >>> stack_blend = H2OStackedEnsembleEstimator(base_models=[my_gbm, my_rf], ... seed=1, ... metalearner_fold_assignment="Random") >>> stack_blend.train(x=x, y=y, training_frame=train, blending_frame=blend) >>> stack_blend.model_performance(blend).auc() 
 - 
property metalearner_fold_column¶
- Column with cross-validation fold index assignment per observation for cross-validation of the metalearner. - Type: - str.- Examples
 - >>> from h2o.estimators.random_forest import H2ORandomForestEstimator >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator >>> train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_train_5k.csv") >>> test = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_test_5k.csv") >>> fold_column = "fold_id" >>> train[fold_column] = train.kfold_column(n_folds=3, seed=1) >>> x = train.columns >>> y = "response" >>> x.remove(y) >>> x.remove(fold_column) >>> train[y] = train[y].asfactor() >>> test[y] = test[y].asfactor() >>> nfolds = 3 >>> my_gbm = H2OGradientBoostingEstimator(distribution="bernoulli", ... ntrees=10, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_gbm.train(x=x, y=y, training_frame=train) >>> my_rf = H2ORandomForestEstimator(ntrees=50, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_rf.train(x=x, y=y, training_frame=train) >>> stack = H2OStackedEnsembleEstimator(base_models=[my_gbm, my_rf], ... metalearner_fold_column=fold_column, ... metalearner_params=dict(keep_cross_validation_models=True)) >>> stack.train(x=x, y=y, training_frame=train) >>> stack.model_performance().auc() 
 - 
property metalearner_nfolds¶
- Number of folds for K-fold cross-validation of the metalearner algorithm (0 to disable or >= 2). - Type: - int, defaults to- 0.- Examples
 - >>> from h2o.estimators.random_forest import H2ORandomForestEstimator >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator >>> higgs = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_train_5k.csv") >>> train, blend = higgs.split_frame(ratios = [.8], seed = 1234) >>> x = train.columns >>> y = "response" >>> x.remove(y) >>> train[y] = train[y].asfactor() >>> blend[y] = blend[y].asfactor() >>> nfolds = 3 >>> my_gbm = H2OGradientBoostingEstimator(distribution="bernoulli", ... ntrees=1, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_gbm.train(x=x, y=y, training_frame=train) >>> my_rf = H2ORandomForestEstimator(ntrees=50, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_rf.train(x=x, y=y, training_frame=train) >>> stack_blend = H2OStackedEnsembleEstimator(base_models=[my_gbm, my_rf], ... seed=1, ... metalearner_nfolds=3) >>> stack_blend.train(x=x, y=y, training_frame=train, blending_frame=blend) >>> stack_blend.model_performance(blend).auc() 
 - 
property metalearner_params¶
- Parameters for metalearner algorithm - Type: - dict.- Examples
 - >>> from h2o.estimators.random_forest import H2ORandomForestEstimator >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator >>> higgs = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_train_5k.csv") >>> train, blend = higgs.split_frame(ratios = [.8], seed = 1234) >>> x = train.columns >>> y = "response" >>> x.remove(y) >>> train[y] = train[y].asfactor() >>> blend[y] = blend[y].asfactor() >>> nfolds = 3 >>> gbm_params = {"ntrees" : 100, "max_depth" : 6} >>> my_gbm = H2OGradientBoostingEstimator(distribution="bernoulli", ... ntrees=1, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_gbm.train(x=x, y=y, training_frame=train) >>> my_rf = H2ORandomForestEstimator(ntrees=50, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_rf.train(x=x, y=y, training_frame=train) >>> stack_blend = H2OStackedEnsembleEstimator(base_models=[my_gbm, my_rf], ... metalearner_algorithm="gbm", ... metalearner_params=gbm_params) >>> stack_blend.train(x=x, y=y, training_frame=train, blending_frame=blend) >>> stack_blend.model_performance(blend).auc() 
 - 
property metalearner_transform¶
- Transformation used for the level one frame. - Type: - Literal["none", "logit"], defaults to- "none".
 - 
property offset_column¶
- Offset column. This will be added to the combination of columns before applying the link function. - Type: - str.
 - 
property response_column¶
- Response variable column. - Type: - str.
 - 
property score_training_samples¶
- Specify the number of training set samples for scoring. The value must be >= 0. To use all training samples, enter 0. - Type: - int, defaults to- 10000.
 - 
property seed¶
- Seed for random numbers; passed through to the metalearner algorithm. Defaults to -1 (time-based random number) - Type: - int, defaults to- -1.- Examples
 - >>> from h2o.estimators.random_forest import H2ORandomForestEstimator >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator >>> higgs = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_train_5k.csv") >>> train, blend = higgs.split_frame(ratios = [.8], seed = 1234) >>> x = train.columns >>> y = "response" >>> x.remove(y) >>> train[y] = train[y].asfactor() >>> blend[y] = blend[y].asfactor() >>> nfolds = 3 >>> my_gbm = H2OGradientBoostingEstimator(distribution="bernoulli", ... ntrees=1, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_gbm.train(x=x, y=y, training_frame=train) >>> my_rf = H2ORandomForestEstimator(ntrees=50, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_rf.train(x=x, y=y, training_frame=train) >>> stack_blend = H2OStackedEnsembleEstimator(base_models=[my_gbm, my_rf], ... seed=1, ... metalearner_fold_assignment="Random") >>> stack_blend.train(x=x, y=y, training_frame=train, blending_frame=blend) >>> stack_blend.model_performance(blend).auc() 
 - 
train(x=None, y=None, training_frame=None, blending_frame=None, verbose=False, **kwargs)[source]¶
- Train the H2O model. - Parameters
- x – A list of column names or indices indicating the predictor columns. 
- y – An index or a column name indicating the response column. 
- training_frame (H2OFrame) – The H2OFrame having the columns indicated by x and y (as well as any additional columns specified by fold, offset, and weights). 
- offset_column – The name or index of the column in training_frame that holds the offsets. 
- fold_column – The name or index of the column in training_frame that holds the per-row fold assignments. 
- weights_column – The name or index of the column in training_frame that holds the per-row weights. 
- validation_frame – H2OFrame with validation data to be scored on while training. 
- max_runtime_secs (float) – Maximum allowed runtime in seconds for model training. Use 0 to disable. 
- verbose (bool) – Print scoring history to stdout. Defaults to False. 
 
 
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> from h2o.estimators.random_forest import H2ORandomForestEstimator >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator >>> higgs = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_train_5k.csv") >>> train, valid = higgs.split_frame(ratios = [.8], seed = 1234) >>> x = train.columns >>> y = "response" >>> x.remove(y) >>> train[y] = train[y].asfactor() >>> blend[y] = blend[y].asfactor() >>> nfolds = 3 >>> my_gbm = H2OGradientBoostingEstimator(distribution="bernoulli", ... ntrees=1, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_gbm.train(x=x, y=y, training_frame=train) >>> my_rf = H2ORandomForestEstimator(ntrees=50, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_rf.train(x=x, y=y, training_frame=train) >>> stack_blend = H2OStackedEnsembleEstimator(base_models=[my_gbm, my_rf], ... seed=1, ... metalearner_fold_assignment="Random") >>> stack_blend.train(x=x, y=y, training_frame=train, validation_frame=valid) >>> stack_blend.model_performance(blend).auc() 
 - 
property validation_frame¶
- Id of the validation data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> from h2o.estimators.random_forest import H2ORandomForestEstimator >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator >>> higgs = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/higgs_train_5k.csv") >>> train, valid = higgs.split_frame(ratios = [.8], seed = 1234) >>> x = train.columns >>> y = "response" >>> x.remove(y) >>> train[y] = train[y].asfactor() >>> blend[y] = blend[y].asfactor() >>> nfolds = 3 >>> my_gbm = H2OGradientBoostingEstimator(distribution="bernoulli", ... ntrees=1, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_gbm.train(x=x, y=y, training_frame=train) >>> my_rf = H2ORandomForestEstimator(ntrees=50, ... nfolds=nfolds, ... fold_assignment="Modulo", ... keep_cross_validation_predictions=True, ... seed=1) >>> my_rf.train(x=x, y=y, training_frame=train) >>> stack_blend = H2OStackedEnsembleEstimator(base_models=[my_gbm, my_rf], ... seed=1, ... metalearner_fold_assignment="Random") >>> stack_blend.train(x=x, y=y, training_frame=train, validation_frame=valid) >>> stack_blend.model_performance(blend).auc() 
 - 
property weights_column¶
- Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0. - Type: - str.
 
H2OTargetEncoderEstimator¶
- 
class h2o.estimators.targetencoder.H2OTargetEncoderEstimator(model_id=None, training_frame=None, fold_column=None, response_column=None, ignored_columns=None, columns_to_encode=None, keep_original_categorical_columns=True, blending=False, inflection_point=10.0, smoothing=20.0, data_leakage_handling='none', noise=0.01, seed=-1)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- TargetEncoder - 
property blending¶
- If true, enables blending of posterior probabilities (computed for a given categorical value) with prior probabilities (computed on the entire set). This allows to mitigate the effect of categorical values with small cardinality. The blending effect can be tuned using the inflection_point and smoothing parameters. - Type: - bool, defaults to- False.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> predictors = ["home.dest", "cabin", "embarked"] >>> response = "survived" >>> titanic["survived"] = titanic["survived"].asfactor() >>> fold_col = "kfold_column" >>> titanic[fold_col] = titanic.kfold_column(n_folds=5, seed=1234) >>> titanic_te = H2OTargetEncoderEstimator(inflection_point=35, ... smoothing=25, ... blending=True) >>> titanic_te.train(x=predictors, ... y=response, ... training_frame=titanic) >>> titanic_te 
 - 
property columns_to_encode¶
- List of categorical columns or groups of categorical columns to encode. When groups of columns are specified, each group is encoded as a single column (interactions are created internally). - Type: - List[List[str]].
 - 
property data_leakage_handling¶
- Data leakage handling strategy used to generate the encoding. Supported options are: 1) “none” (default) - no holdout, using the entire training frame. 2) “leave_one_out” - current row’s response value is subtracted from the per-level frequencies pre-calculated on the entire training frame. 3) “k_fold” - encodings for a fold are generated based on out-of-fold data. - Type: - Literal["leave_one_out", "k_fold", "none"], defaults to- "none".- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> predictors = ["home.dest", "cabin", "embarked"] >>> response = "survived" >>> titanic["survived"] = titanic["survived"].asfactor() >>> fold_col = "kfold_column" >>> titanic[fold_col] = titanic.kfold_column(n_folds=5, seed=1234) >>> titanic_te = H2OTargetEncoderEstimator(inflection_point=35, ... smoothing=25, ... data_leakage_handling="k_fold", ... blending=True) >>> titanic_te.train(x=predictors, ... y=response, ... training_frame=titanic) >>> titanic_te 
 - 
property f¶
- [Deprecated] Use - smoothinginstead
 - 
property fold_column¶
- Column with cross-validation fold index assignment per observation. - Type: - str.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> predictors = ["home.dest", "cabin", "embarked"] >>> response = "survived" >>> titanic["survived"] = titanic["survived"].asfactor() >>> fold_col = "kfold_column" >>> titanic[fold_col] = titanic.kfold_column(n_folds=5, seed=1234) >>> titanic_te = H2OTargetEncoderEstimator(inflection_point=35, ... smoothing=25, ... blending=True) >>> titanic_te.train(x=predictors, ... y=response, ... training_frame=titanic) >>> titanic_te 
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property inflection_point¶
- Inflection point of the sigmoid used to blend probabilities (see blending parameter). For a given categorical value, if it appears less that inflection_point in a data sample, then the influence of the posterior probability will be smaller than the prior. - Type: - float, defaults to- 10.0.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> predictors = ["home.dest", "cabin", "embarked"] >>> response = "survived" >>> titanic["survived"] = titanic["survived"].asfactor() >>> fold_col = "kfold_column" >>> titanic[fold_col] = titanic.kfold_column(n_folds=5, seed=1234) >>> titanic_te = H2OTargetEncoderEstimator(inflection_point=35, ... smoothing=25, ... blending=True) >>> titanic_te.train(x=predictors, ... y=response, ... training_frame=titanic) >>> titanic_te 
 - 
property k¶
- [Deprecated] Use - inflection_pointinstead
 - 
property keep_original_categorical_columns¶
- If true, the original non-encoded categorical features will remain in the result frame. - Type: - bool, defaults to- True.
 - 
property noise¶
- The amount of noise to add to the encoded column. Use 0 to disable noise, and -1 (=AUTO) to let the algorithm determine a reasonable amount of noise. - Type: - float, defaults to- 0.01.
 - 
property noise_level¶
- [Deprecated] Use - noiseinstead
 - 
property response_column¶
- Response variable column. - Type: - str.
 - 
property seed¶
- Seed used to generate the noise. By default, the seed is chosen randomly. - Type: - int, defaults to- -1.
 - 
property smoothing¶
- Smoothing factor corresponds to the inverse of the slope at the inflection point on the sigmoid used to blend probabilities (see blending parameter). If smoothing tends towards 0, then the sigmoid used for blending turns into a Heaviside step function. - Type: - float, defaults to- 20.0.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> predictors = ["home.dest", "cabin", "embarked"] >>> response = "survived" >>> titanic["survived"] = titanic["survived"].asfactor() >>> fold_col = "kfold_column" >>> titanic[fold_col] = titanic.kfold_column(n_folds=5, seed=1234) >>> titanic_te = H2OTargetEncoderEstimator(inflection_point=35, ... smoothing=25, ... blending=True) >>> titanic_te.train(x=predictors, ... y=response, ... training_frame=titanic) >>> titanic_te 
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> predictors = ["home.dest", "cabin", "embarked"] >>> response = "survived" >>> titanic["survived"] = titanic["survived"].asfactor() >>> fold_col = "kfold_column" >>> titanic[fold_col] = titanic.kfold_column(n_folds=5, seed=1234) >>> titanic_te = H2OTargetEncoderEstimator(inflection_point=35, ... smoothing=25, ... blending=True) >>> titanic_te.train(x=predictors, ... y=response, ... training_frame=titanic) >>> titanic_te 
 - 
transform(frame, blending=None, inflection_point=None, smoothing=None, noise=None, as_training=False, **kwargs)[source]¶
- Apply transformation to te_columns based on the encoding maps generated during train() method call. - Parameters
- frame (H2OFrame) – the frame on which to apply the target encoding transformations. 
- blending (boolean) – If provided, this overrides the blending parameter on the model. 
- inflection_point (float) – If provided, this overrides the inflection_point parameter on the model. 
- smoothing (float) – If provided, this overrides the smoothing parameter on the model. 
- noise (float) – If provided, this overrides the amount of random noise added to the target encoding defined on the model, this helps prevent overfitting. 
- as_training (boolean) – Must be set to True when encoding the training frame. Defaults to False. 
 
- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> predictors = ["home.dest", "cabin", "embarked"] >>> response = "survived" >>> titanic[response] = titanic[response].asfactor() >>> fold_col = "kfold_column" >>> titanic[fold_col] = titanic.kfold_column(n_folds=5, seed=1234) >>> titanic_te = H2OTargetEncoderEstimator(data_leakage_handling="leave_one_out", ... inflection_point=35, ... smoothing=25, ... blending=True, ... seed=1234) >>> titanic_te.train(x=predictors, ... y=response, ... training_frame=titanic) >>> transformed = titanic_te.transform(frame=titanic) 
 
- 
property 
H2OUpliftRandomForestEstimator¶
- 
class h2o.estimators.uplift_random_forest.H2OUpliftRandomForestEstimator(model_id=None, training_frame=None, validation_frame=None, score_each_iteration=False, score_tree_interval=0, response_column=None, ignored_columns=None, ignore_const_cols=True, ntrees=50, max_depth=20, min_rows=1.0, nbins=20, nbins_top_level=1024, nbins_cats=1024, max_runtime_secs=0.0, seed=-1, mtries=-2, sample_rate=0.632, sample_rate_per_class=None, col_sample_rate_change_per_level=1.0, col_sample_rate_per_tree=1.0, histogram_type='auto', categorical_encoding='auto', distribution='auto', check_constant_response=True, custom_metric_func=None, treatment_column='treatment', uplift_metric='auto', auuc_type='auto', auuc_nbins=-1, stopping_rounds=0, stopping_metric='auto', stopping_tolerance=0.001)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- Uplift Distributed Random Forest - Build a Uplift Random Forest model - Builds a Uplift Random Forest model on an H2OFrame. - 
property auuc_nbins¶
- Number of bins to calculate Area Under Uplift Curve. - Type: - int, defaults to- -1.- Examples
 - >>> import h2o >>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> h2o.init() >>> data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6","f7", "f8"] >>> response = "conversion" >>> data[response] = data[response].asfactor() >>> treatment_column = "treatment" >>> data[treatment_column] = data[treatment_column].asfactor() >>> train, valid = data.split_frame(ratios=[.8], seed=1234) >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="KL", ... min_rows=10, ... seed=1234, ... auuc_type="qini", ... auuc_nbins=100) >>> uplift_model.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> uplift_model.model_performance() 
 - 
property auuc_type¶
- Metric used to calculate Area Under Uplift Curve. - Type: - Literal["auto", "qini", "lift", "gain"], defaults to- "auto".- Examples
 - >>> import h2o >>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> h2o.init() >>> data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6","f7", "f8"] >>> response = "conversion" >>> data[response] = data[response].asfactor() >>> treatment_column = "treatment" >>> data[treatment_column] = data[treatment_column].asfactor() >>> train, valid = data.split_frame(ratios=[.8], seed=1234) >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... treatment_column=treatment_column, ... uplift_metric="KL", ... min_rows=10, ... seed=1234, ... auuc_type="gain") >>> uplift_model.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> uplift_model.model_performance() 
 - 
property categorical_encoding¶
- Encoding scheme for categorical features - Type: - Literal["auto", "enum", "one_hot_internal", "one_hot_explicit", "binary", "eigen", "label_encoder", "sort_by_response", "enum_limited"], defaults to- "auto".
 - 
property check_constant_response¶
- Check if response column is constant. If enabled, then an exception is thrown if the response column is a constant value.If disabled, then model will train regardless of the response column being a constant value or not. - Type: - bool, defaults to- True.
 - 
property col_sample_rate_change_per_level¶
- Relative change of the column sampling rate for every level (must be > 0.0 and <= 2.0) - Type: - float, defaults to- 1.0.
 - 
property col_sample_rate_per_tree¶
- Column sample rate per tree (from 0.0 to 1.0) - Type: - float, defaults to- 1.0.
 - 
property custom_metric_func¶
- Reference to custom evaluation function, format: language:keyName=funcName - Type: - str.
 - 
property distribution¶
- Distribution function - Type: - Literal["auto", "bernoulli"], defaults to- "auto".
 - 
property histogram_type¶
- What type of histogram to use for finding optimal split points - Type: - Literal["auto", "uniform_adaptive", "random", "quantiles_global", "round_robin", "uniform_robust"], defaults to- "auto".
 - 
property ignore_const_cols¶
- Ignore constant columns. - Type: - bool, defaults to- True.
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property max_depth¶
- Maximum tree depth (0 for unlimited). - Type: - int, defaults to- 20.
 - 
property max_runtime_secs¶
- Maximum allowed runtime in seconds for model training. Use 0 to disable. - Type: - float, defaults to- 0.0.
 - 
property min_rows¶
- Fewest allowed (weighted) observations in a leaf. - Type: - float, defaults to- 1.0.
 - 
property mtries¶
- Number of variables randomly sampled as candidates at each split. If set to -1, defaults to sqrt{p} for classification and p/3 for regression (where p is the # of predictors - Type: - int, defaults to- -2.
 - 
property nbins¶
- For numerical columns (real/int), build a histogram of (at least) this many bins, then split at the best point - Type: - int, defaults to- 20.
 - 
property nbins_cats¶
- For categorical columns (factors), build a histogram of this many bins, then split at the best point. Higher values can lead to more overfitting. - Type: - int, defaults to- 1024.
 - 
property nbins_top_level¶
- For numerical columns (real/int), build a histogram of (at most) this many bins at the root level, then decrease by factor of two per level - Type: - int, defaults to- 1024.
 - 
property ntrees¶
- Number of trees. - Type: - int, defaults to- 50.
 - 
property response_column¶
- Response variable column. - Type: - str.
 - 
property sample_rate¶
- Row sample rate per tree (from 0.0 to 1.0) - Type: - float, defaults to- 0.632.
 - 
property sample_rate_per_class¶
- A list of row sample rates per class (relative fraction for each class, from 0.0 to 1.0), for each tree - Type: - List[float].
 - 
property score_each_iteration¶
- Whether to score during each iteration of model training. - Type: - bool, defaults to- False.
 - 
property score_tree_interval¶
- Score the model after every so many trees. Disabled if set to 0. - Type: - int, defaults to- 0.
 - 
property seed¶
- Seed for pseudo random number generator (if applicable) - Type: - int, defaults to- -1.
 - 
property stopping_metric¶
- Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. - Type: - Literal["auto", "auuc", "ate", "att", "atc", "qini"], defaults to- "auto".
 - 
property stopping_rounds¶
- Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) - Type: - int, defaults to- 0.
 - 
property stopping_tolerance¶
- Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much) - Type: - float, defaults to- 0.001.
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].
 - 
property treatment_column¶
- Define the column which will be used for computing uplift gain to select best split for a tree. The column has to divide the dataset into treatment (value 1) and control (value 0) groups. - Type: - str, defaults to- "treatment".- Examples
 - >>> import h2o >>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> h2o.init() >>> data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6","f7", "f8"] >>> response = "conversion" >>> data[response] = data[response].asfactor() >>> treatment_column = "treatment" >>> data[treatment_column] = data[treatment_column].asfactor() >>> train, valid = data.split_frame(ratios=[.8], seed=1234) >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... uplift_metric="KL", ... min_rows=10, ... seed=1234, ... auuc_type="qini", ... treatment_column=treatment_column) >>> uplift_model.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> uplift_model.model_performance() 
 - 
property uplift_metric¶
- Divergence metric used to find best split when building an uplift tree. - Type: - Literal["auto", "kl", "euclidean", "chi_squared"], defaults to- "auto".- Examples
 - >>> import h2o >>> from h2o.estimators import H2OUpliftRandomForestEstimator >>> h2o.init() >>> data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/uplift/criteo_uplift_13k.csv") >>> predictors = ["f1", "f2", "f3", "f4", "f5", "f6","f7", "f8"] >>> response = "conversion" >>> data[response] = data[response].asfactor() >>> treatment_column = "treatment" >>> data[treatment_column] = data[treatment_column].asfactor() >>> train, valid = data.split_frame(ratios=[.8], seed=1234) >>> uplift_model = H2OUpliftRandomForestEstimator(ntrees=10, ... max_depth=5, ... min_rows=10, ... seed=1234, ... auuc_type="qini", ... treatment_column=treatment_column, ... uplift_metric="euclidean") >>> uplift_model.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> uplift_model.model_performance() 
 - 
property validation_frame¶
- Id of the validation data frame. - Type: - Union[None, str, H2OFrame].
 
- 
property 
H2OXGBoostEstimator¶
- 
class h2o.estimators.xgboost.H2OXGBoostEstimator(model_id=None, training_frame=None, validation_frame=None, nfolds=0, keep_cross_validation_models=True, keep_cross_validation_predictions=False, keep_cross_validation_fold_assignment=False, score_each_iteration=False, fold_assignment='auto', fold_column=None, response_column=None, ignored_columns=None, ignore_const_cols=True, offset_column=None, weights_column=None, stopping_rounds=0, stopping_metric='auto', stopping_tolerance=0.001, max_runtime_secs=0.0, seed=-1, distribution='auto', tweedie_power=1.5, categorical_encoding='auto', quiet_mode=True, checkpoint=None, export_checkpoints_dir=None, custom_metric_func=None, ntrees=50, max_depth=6, min_rows=1.0, min_child_weight=1.0, learn_rate=0.3, eta=0.3, sample_rate=1.0, subsample=1.0, col_sample_rate=1.0, colsample_bylevel=1.0, col_sample_rate_per_tree=1.0, colsample_bytree=1.0, colsample_bynode=1.0, max_abs_leafnode_pred=0.0, max_delta_step=0.0, monotone_constraints=None, interaction_constraints=None, score_tree_interval=0, min_split_improvement=0.0, gamma=0.0, nthread=-1, save_matrix_directory=None, build_tree_one_node=False, parallelize_cross_validation=True, calibrate_model=False, calibration_frame=None, calibration_method='auto', max_bins=256, max_leaves=0, sample_type='uniform', normalize_type='tree', rate_drop=0.0, one_drop=False, skip_drop=0.0, tree_method='auto', grow_policy='depthwise', booster='gbtree', reg_lambda=1.0, reg_alpha=0.0, dmatrix_type='auto', backend='auto', gpu_id=None, gainslift_bins=-1, auc_type='auto', scale_pos_weight=1.0, eval_metric=None, score_eval_metric_only=False)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- XGBoost - Builds an eXtreme Gradient Boosting model using the native XGBoost backend. - 
property auc_type¶
- Set default multinomial AUC type. - Type: - Literal["auto", "none", "macro_ovr", "weighted_ovr", "macro_ovo", "weighted_ovo"], defaults to- "auto".
 - 
static available()[source]¶
- Ask the H2O server whether a XGBoost model can be built (depends on availability of native backends). :return: True if a XGBoost model can be built, or False otherwise. - Examples
 - >>> boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") >>> predictors = boston.columns[:-1] >>> response = "medv" >>> boston['chas'] = boston['chas'].asfactor() >>> train, valid = boston.split_frame(ratios=[.8]) >>> boston_xgb = H2OXGBoostEstimator(seed=1234) >>> boston_xgb.available() 
 - 
property backend¶
- Backend. By default (auto), a GPU is used if available. - Type: - Literal["auto", "gpu", "cpu"], defaults to- "auto".- Examples
 - >>> pros = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv") >>> pros["CAPSULE"] = pros["CAPSULE"].asfactor() >>> pros_xgb = H2OXGBoostEstimator(tree_method="exact", ... seed=123, ... backend="cpu") >>> pros_xgb.train(y="CAPSULE", ... ignored_columns=["ID"], ... training_frame=pros) >>> pros_xgb.auc() 
 - 
property booster¶
- Booster type - Type: - Literal["gbtree", "gblinear", "dart"], defaults to- "gbtree".- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], ... seed=1234) >>> titanic_xgb = H2OXGBoostEstimator(booster='dart', ... normalize_type="tree", ... seed=1234) >>> titanic_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(titanic_xgb.auc(valid=True)) 
 - 
property build_tree_one_node¶
- Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets. - Type: - bool, defaults to- False.
 - 
property calibrate_model¶
- Use Platt Scaling (default) or Isotonic Regression to calculate calibrated class probabilities. Calibration can provide more accurate estimates of class probabilities. - Type: - bool, defaults to- False.
 - 
property calibration_frame¶
- Data for model calibration - Type: - Union[None, str, H2OFrame].
 - 
property calibration_method¶
- Calibration method to use - Type: - Literal["auto", "platt_scaling", "isotonic_regression"], defaults to- "auto".
 - 
property categorical_encoding¶
- Encoding scheme for categorical features - Type: - Literal["auto", "enum", "one_hot_internal", "one_hot_explicit", "binary", "eigen", "label_encoder", "sort_by_response", "enum_limited"], defaults to- "auto".- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], ... seed=1234) >>> encoding = "one_hot_explicit" >>> airlines_xgb = H2OXGBoostEstimator(categorical_encoding=encoding, ... seed=1234) >>> airlines_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_xgb.auc(valid=True) 
 - 
property checkpoint¶
- Model checkpoint to resume training with. - Type: - Union[None, str, H2OEstimator].- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> cars["economy_20mpg"] = cars["economy_20mpg"].asfactor() >>> predictors = ["displacement","power","weight","year","economy_20mpg"] >>> response = "acceleration" >>> from h2o.estimators import H2OXGBoostEstimator >>> cars_xgb = H2OXGBoostEstimator(seed=1234) >>> train, valid = cars.split_frame(ratios=[.8]) >>> cars_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_xgb.mse() >>> cars_xgb_continued = H2OXGBoostEstimator(checkpoint=cars_xgb.model_id, ... ntrees=51, ... seed=1234) >>> cars_xgb_continued.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_xgb_continued.mse() 
 - 
property col_sample_rate¶
- (same as colsample_bylevel) Column sample rate (from 0.0 to 1.0) - Type: - float, defaults to- 1.0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], ... seed=1234) >>> airlines_xgb = H2OXGBoostEstimator(col_sample_rate=.7, ... seed=1234) >>> airlines_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(airlines_xgb.auc(valid=True)) 
 - 
property col_sample_rate_per_tree¶
- (same as colsample_bytree) Column sample rate per tree (from 0.0 to 1.0) - Type: - float, defaults to- 1.0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_xgb = H2OXGBoostEstimator(col_sample_rate_per_tree=.7, ... seed=1234) >>> airlines_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(airlines_xgb.auc(valid=True)) 
 - 
property colsample_bylevel¶
- (same as col_sample_rate) Column sample rate (from 0.0 to 1.0) - Type: - float, defaults to- 1.0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], ... seed=1234) >>> airlines_xgb = H2OXGBoostEstimator(col_sample_rate=.7, ... seed=1234) >>> airlines_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(airlines_xgb.auc(valid=True)) 
 - 
property colsample_bynode¶
- Column sample rate per tree node (from 0.0 to 1.0) - Type: - float, defaults to- 1.0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_xgb = H2OXGBoostEstimator(colsample_bynode=.5, ... seed=1234) >>> airlines_xgb.train(x=predictors, y=response, ... training_frame=train, validation_frame=valid) >>> print(airlines_xgb.auc(valid=True)) 
 - 
property colsample_bytree¶
- (same as col_sample_rate_per_tree) Column sample rate per tree (from 0.0 to 1.0) - Type: - float, defaults to- 1.0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_xgb = H2OXGBoostEstimator(col_sample_rate_per_tree=.7, ... seed=1234) >>> airlines_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(airlines_xgb.auc(valid=True)) 
 - 
convert_H2OXGBoostParams_2_XGBoostParams()[source]¶
- In order to use convert_H2OXGBoostParams_2_XGBoostParams and convert_H2OFrame_2_DMatrix, you must import the following toolboxes: xgboost, pandas, numpy and scipy.sparse. - Given an H2OXGBoost model, this method will generate the corresponding parameters that should be used by native XGBoost in order to give exactly the same result, assuming that the same dataset (derived from h2oFrame) is used to train the native XGBoost model. - Follow the steps below to compare H2OXGBoost and native XGBoost: - Train the H2OXGBoost model with H2OFrame trainFile and generate a prediction: 
 - h2oModelD = H2OXGBoostEstimator(**h2oParamsD) # parameters specified as a dict() 
- h2oModelD.train(x=myX, y=y, training_frame=trainFile) # train with H2OFrame trainFile 
- h2oPredict = h2oPredictD = h2oModelD.predict(trainFile) 
 - Derive the DMatrix from H2OFrame: 
 - nativeDMatrix = trainFile.convert_H2OFrame_2_DMatrix(myX, y, h2oModelD) 
 - Derive the parameters for native XGBoost: 
 - nativeParams = h2oModelD.convert_H2OXGBoostParams_2_XGBoostParams() 
 - Train your native XGBoost model and generate a prediction: 
 - nativeModel = xgb.train(params=nativeParams[0], dtrain=nativeDMatrix, num_boost_round=nativeParams[1]) 
- nativePredict = nativeModel.predict(data=nativeDMatrix, ntree_limit=nativeParams[1] 
 - Compare the predictions h2oPredict from H2OXGBoost, nativePredict from native XGBoost. 
 - Returns
- nativeParams, num_boost_round 
 
 - 
property custom_metric_func¶
- Reference to custom evaluation function, format: language:keyName=funcName - Type: - str.
 - 
property distribution¶
- Distribution function - Type: - Literal["auto", "bernoulli", "multinomial", "gaussian", "poisson", "gamma", "tweedie", "laplace", "quantile", "huber"], defaults to- "auto".- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","acceleration","year"] >>> response = "cylinders" >>> train, valid = cars.split_frame(ratios=[.8], ... seed=1234) >>> cars_xgb = H2OXGBoostEstimator(distribution="poisson", ... seed=1234) >>> cars_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> cars_xgb.mse(valid=True) 
 - 
property dmatrix_type¶
- Type of DMatrix. For sparse, NAs and 0 are treated equally. - Type: - Literal["auto", "dense", "sparse"], defaults to- "auto".- Examples
 - >>> boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") >>> predictors = boston.columns[:-1] >>> response = "medv" >>> boston['chas'] = boston['chas'].asfactor() >>> train, valid = boston.split_frame(ratios=[.8]) >>> boston_xgb = H2OXGBoostEstimator(dmatrix_type="auto", ... seed=1234) >>> boston_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> boston_xgb.mse() 
 - 
property eta¶
- (same as learn_rate) Learning rate (from 0.0 to 1.0) - Type: - float, defaults to- 0.3.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> del predictors[1:3] >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], ... seed=1234) >>> titanic_xgb = H2OXGBoostEstimator(ntrees=10000, ... learn_rate=0.01, ... stopping_rounds=5, ... stopping_metric="AUC", ... stopping_tolerance=1e-4, ... seed=1234) >>> titanic_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(titanic_xgb.auc(valid=True)) 
 - 
property eval_metric¶
- Specification of evaluation metric that will be passed to the native XGBoost backend. - Type: - str.
 - 
property export_checkpoints_dir¶
- Automatically export generated models to this directory. - Type: - str.- Examples
 - >>> import tempfile >>> from h2o.grid.grid_search import H2OGridSearch >>> from os import listdir >>> airlines = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip", destination_frame="air.hex") >>> predictors = ["DayofMonth", "DayOfWeek"] >>> response = "IsDepDelayed" >>> hyper_parameters = {'ntrees': [5,10]} >>> search_crit = {'strategy': "RandomDiscrete", ... 'max_models': 5, ... 'seed': 1234, ... 'stopping_rounds': 3, ... 'stopping_metric': "AUTO", ... 'stopping_tolerance': 1e-2} >>> checkpoints_dir = tempfile.mkdtemp() >>> air_grid = H2OGridSearch(H2OXGBoostEstimator, ... hyper_params=hyper_parameters, ... search_criteria=search_crit) >>> air_grid.train(x=predictors, ... y=response, ... training_frame=airlines, ... distribution="bernoulli", ... learn_rate=0.1, ... max_depth=3, ... export_checkpoints_dir=checkpoints_dir) >>> len(listdir(checkpoints_dir)) 
 - 
property fold_assignment¶
- Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems. - Type: - Literal["auto", "random", "modulo", "stratified"], defaults to- "auto".- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> response = 'survived' >>> assignment_type = "Random" >>> titanic_xgb = H2OXGBoostEstimator(fold_assignment=assignment_type, ... nfolds=5, ... seed=1234) >>> titanic_xgb.train(x=predictors, ... y=response, ... training_frame=titanic) >>> titanic_xgb.auc(xval=True) 
 - 
property fold_column¶
- Column with cross-validation fold index assignment per observation. - Type: - str.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> response = 'survived' >>> fold_numbers = titanic.kfold_column(n_folds=5, ... seed=1234) >>> fold_numbers.set_names(["fold_numbers"]) >>> titanic = titanic.cbind(fold_numbers) >>> print(titanic['fold_numbers']) >>> titanic_xgb = H2OXGBoostEstimator(seed=1234) >>> titanic_xgb.train(x=predictors, ... y=response, ... training_frame=titanic, ... fold_column="fold_numbers") >>> titanic_xgb.auc(xval=True) 
 - 
property gainslift_bins¶
- Gains/Lift table number of bins. 0 means disabled.. Default value -1 means automatic binning. - Type: - int, defaults to- -1.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/airlines_train.csv") >>> model = H2OXGBoostEstimator(ntrees=1, gainslift_bins=20) >>> model.train(x=["Origin", "Distance"], ... y="IsDepDelayed", ... training_frame=airlines) >>> model.gains_lift() 
 - 
property gamma¶
- (same as min_split_improvement) Minimum relative improvement in squared error reduction for a split to happen - Type: - float, defaults to- 0.0.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], ... seed=1234) >>> titanic_xgb = H2OXGBoostEstimator(min_split_improvement=1e-3, ... seed=1234) >>> titanic_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(titanic_xgb.auc(valid=True)) 
 - 
property gpu_id¶
- Which GPU(s) to use. - Type: - List[int].- Examples
 - >>> boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") >>> predictors = boston.columns[:-1] >>> response = "medv" >>> boston['chas'] = boston['chas'].asfactor() >>> train, valid = boston.split_frame(ratios=[.8]) >>> boston_xgb = H2OXGBoostEstimator(gpu_id=0, ... seed=1234) >>> boston_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> boston_xgb.mse() 
 - 
property grow_policy¶
- Grow policy - depthwise is standard GBM, lossguide is LightGBM - Type: - Literal["depthwise", "lossguide"], defaults to- "depthwise".- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> response = 'survived' >>> titanic["const_1"] = 6 >>> titanic["const_2"] = 7 >>> train, valid = titanic.split_frame(ratios=[.8], ... seed=1234) >>> titanic_xgb = H2OXGBoostEstimator(seed=1234, ... grow_policy="depthwise") >>> titanic_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> titanic_xgb.auc(valid=True) 
 - 
property ignore_const_cols¶
- Ignore constant columns. - Type: - bool, defaults to- True.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> response = 'survived' >>> titanic["const_1"] = 6 >>> titanic["const_2"] = 7 >>> train, valid = titanic.split_frame(ratios=[.8], ... seed=1234) >>> titanic_xgb = H2OXGBoostEstimator(seed=1234, ... ignore_const_cols=True) >>> titanic_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> titanic_xgb.auc(valid=True) 
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property interaction_constraints¶
- A set of allowed column interactions. - Type: - List[List[str]].
 - 
property keep_cross_validation_fold_assignment¶
- Whether to keep the cross-validation fold assignment. - Type: - bool, defaults to- False.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], ... seed=1234) >>> titanic_xgb = H2OXGBoostEstimator(keep_cross_validation_fold_assignment=True, ... nfolds=5, ... seed=1234) >>> titanic_xgb.train(x=predictors, ... y=response, ... training_frame=train) >>> titanic_xgb.cross_validation_fold_assignment() 
 - 
property keep_cross_validation_models¶
- Whether to keep the cross-validation models. - Type: - bool, defaults to- True.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], ... seed=1234) >>> titanic_xgb = H2OXGBoostEstimator(keep_cross_validation_models=True, ... nfolds=5 , ... seed=1234) >>> titanic_xgb.train(x=predictors, ... y=response, ... training_frame=train) >>> titanic_xgb.cross_validation_models() 
 - 
property keep_cross_validation_predictions¶
- Whether to keep the predictions of the cross-validation models. - Type: - bool, defaults to- False.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], ... seed=1234) >>> titanic_xgb = H2OXGBoostEstimator(keep_cross_validation_predictions=True, ... nfolds=5, ... seed=1234) >>> titanic_xgb.train(x=predictors, ... y=response, ... training_frame=train) >>> titanic_xgb.cross_validation_predictions() 
 - 
property learn_rate¶
- (same as eta) Learning rate (from 0.0 to 1.0) - Type: - float, defaults to- 0.3.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> del predictors[1:3] >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], seed=1234) >>> titanic_xgb = H2OXGBoostEstimator(ntrees=10000, ... learn_rate=0.01, ... stopping_rounds=5, ... stopping_metric="AUC", ... stopping_tolerance=1e-4, ... seed=1234) >>> titanic_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(titanic_xgb.auc(valid=True)) 
 - 
property max_abs_leafnode_pred¶
- (same as max_delta_step) Maximum absolute value of a leaf node prediction - Type: - float, defaults to- 0.0.- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> train, valid = covtype.split_frame(ratios=[.8], ... seed=1234) >>> cov_xgb = H2OXGBoostEstimator(max_abs_leafnode_pred=float(2), ... seed=1234) >>> cov_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(cov_xgb.logloss(valid=True)) 
 - 
property max_bins¶
- For tree_method=hist only: maximum number of bins - Type: - int, defaults to- 256.- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> train, valid = covtype.split_frame(ratios=[.8], ... seed=1234) >>> cov_xgb = H2OXGBoostEstimator(max_bins=200, ... seed=1234) >>> cov_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(cov_xgb.logloss(valid=True)) 
 - 
property max_delta_step¶
- (same as max_abs_leafnode_pred) Maximum absolute value of a leaf node prediction - Type: - float, defaults to- 0.0.- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> train, valid = covtype.split_frame(ratios=[.8], ... seed=1234) >>> cov_xgb = H2OXGBoostEstimator(max_delta_step=float(2), ... seed=1234) >>> cov_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(cov_xgb.logloss(valid=True)) 
 - 
property max_depth¶
- Maximum tree depth (0 for unlimited). - Type: - int, defaults to- 6.- Examples
 - >>> df = h2o.import_file(path = "http://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> response = "survived" >>> df[response] = df[response].asfactor() >>> predictors = df.columns >>> del predictors[1:3] >>> train, valid, test = df.split_frame(ratios=[0.6,0.2], ... seed=1234, ... destination_frames= ... ['train.hex', ... 'valid.hex', ... 'test.hex']) >>> xgb = H2OXGBoostEstimator() >>> xgb.train(x=predictors, ... y=response, ... training_frame=train) >>> perf = xgb.model_performance(valid) >>> print(perf.auc()) 
 - 
property max_leaves¶
- For tree_method=hist only: maximum number of leaves - Type: - int, defaults to- 0.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> del predictors[1:3] >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], ... seed=1234) >>> titanic_xgb = H2OXGBoostEstimator(max_leaves=0, seed=1234) >>> titanic_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(titanic_xgb.auc(valid=True)) 
 - 
property max_runtime_secs¶
- Maximum allowed runtime in seconds for model training. Use 0 to disable. - Type: - float, defaults to- 0.0.- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> covtype[54] = covtype[54].asfactor() >>> predictors = covtype.columns[0:54] >>> response = 'C55' >>> train, valid = covtype.split_frame(ratios=[.8], ... seed=1234) >>> cov_xgb = H2OXGBoostEstimator(max_runtime_secs=10, ... ntrees=10000, ... max_depth=10, ... seed=1234) >>> cov_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(cov_xgb.logloss(valid=True)) 
 - 
property min_child_weight¶
- (same as min_rows) Fewest allowed (weighted) observations in a leaf. - Type: - float, defaults to- 1.0.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> del predictors[1:3] >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], ... seed=1234) >>> titanic_xgb = H2OXGBoostEstimator(min_child_weight=16, ... seed=1234) >>> titanic_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(titanic_xgb.auc(valid=True)) 
 - 
property min_rows¶
- (same as min_child_weight) Fewest allowed (weighted) observations in a leaf. - Type: - float, defaults to- 1.0.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> del predictors[1:3] >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], ... seed=1234) >>> titanic_xgb = H2OXGBoostEstimator(min_rows=16, ... seed=1234) >>> titanic_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(titanic_xgb.auc(valid=True)) 
 - 
property min_split_improvement¶
- (same as gamma) Minimum relative improvement in squared error reduction for a split to happen - Type: - float, defaults to- 0.0.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> del predictors[1:3] >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], ... seed=1234) >>> titanic_xgb = H2OXGBoostEstimator(min_split_improvement=0.55, ... seed=1234) >>> titanic_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(titanic_xgb.auc(valid=True)) 
 - 
property monotone_constraints¶
- A mapping representing monotonic constraints. Use +1 to enforce an increasing constraint and -1 to specify a decreasing constraint. - Type: - dict.- Examples
 - >>> prostate_hex = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip") >>> prostate_hex["CAPSULE"] = prostate_hex["CAPSULE"].asfactor() >>> response = "CAPSULE" >>> seed=42 >>> monotone_constraints={"AGE":1} >>> xgb_model = H2OXGBoostEstimator(seed=seed, ... monotone_constraints=monotone_constraints) >>> xgb_model.train(y=response, ... ignored_columns=["ID"], ... training_frame=prostate_hex) >>> xgb_model.scoring_history() 
 - 
property nfolds¶
- Number of folds for K-fold cross-validation (0 to disable or >= 2). - Type: - int, defaults to- 0.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> del predictors[1:3] >>> response = 'survived' >>> folds = 5 >>> titanic_xgb = H2OXGBoostEstimator(nfolds=folds, ... seed=1234) >>> titanic_xgb.train(x=predictors, ... y=response, ... training_frame=titanic) >>> titanic_xgb.auc(xval=True) 
 - 
property normalize_type¶
- For booster=dart only: normalize_type - Type: - Literal["tree", "forest"], defaults to- "tree".- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], ... seed=1234) >>> titanic_xgb = H2OXGBoostEstimator(booster='dart', ... normalize_type="tree", ... seed=1234) >>> titanic_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(titanic_xgb.auc(valid=True)) 
 - 
property nthread¶
- Number of parallel threads that can be used to run XGBoost. Cannot exceed H2O cluster limits (-nthreads parameter). Defaults to maximum available - Type: - int, defaults to- -1.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], seed=1234) >>> thread = 4 >>> titanic_xgb = H2OXGBoostEstimator(nthread=thread, ... seed=1234) >>> titanic_xgb.train(x=predictors, ... y=response, ... training_frame=titanic) >>> print(titanic_xgb.auc(train=True)) 
 - 
property ntrees¶
- (same as n_estimators) Number of trees. - Type: - int, defaults to- 50.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> del predictors[1:3] >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], ... seed=1234) >>> tree_num = [20, 50, 80, 110, 140, 170, 200] >>> label = ["20", "50", "80", "110", ... "140", "170", "200"] >>> for key, num in enumerate(tree_num): # Input integer for 'num' and 'key' >>> titanic_xgb = H2OXGBoostEstimator(ntrees=num, ... seed=1234) >>> titanic_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(label[key], 'training score', ... titanic_xgb.auc(train=True)) >>> print(label[key], 'validation score', ... titanic_xgb.auc(valid=True)) 
 - 
property offset_column¶
- Offset column. This will be added to the combination of columns before applying the link function. - Type: - str.
 - 
property one_drop¶
- For booster=dart only: one_drop - Type: - bool, defaults to- False.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> del predictors[1:3] >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], ... seed=1234) >>> titanic_xgb = H2OXGBoostEstimator(booster='dart', ... one_drop=True, ... seed=1234) >>> titanic_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(titanic_xgb.auc(valid=True)) 
 - 
property parallelize_cross_validation¶
- Allow parallel training of cross-validation models - Type: - bool, defaults to- True.
 - 
property quiet_mode¶
- Enable quiet mode - Type: - bool, defaults to- True.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> del predictors[1:3] >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], seed=1234) >>> titanic_xgb = H2OXGBoostEstimator(seed=1234, quiet_mode=True) >>> titanic_xgb.train(x=predictors ... y=response, ... training_frame=train, ... validation_frame=valid) >>> titanic_xgb.mse(valid=True) 
 - 
property rate_drop¶
- For booster=dart only: rate_drop (0..1) - Type: - float, defaults to- 0.0.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> del predictors[1:3] >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], ... seed=1234) >>> titanic_xgb = H2OXGBoostEstimator(rate_drop=0.1, seed=1234) >>> titanic_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(titanic_xgb.auc(valid=True)) 
 - 
property reg_alpha¶
- L1 regularization - Type: - float, defaults to- 0.0.- Examples
 - >>> boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") >>> predictors = boston.columns[:-1] >>> response = "medv" >>> boston['chas'] = boston['chas'].asfactor() >>> train, valid = boston.split_frame(ratios=[.8]) >>> boston_xgb = H2OXGBoostEstimator(reg_alpha=.25) >>> boston_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(boston_xgb.mse(valid=True)) 
 - 
property reg_lambda¶
- L2 regularization - Type: - float, defaults to- 1.0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8]) >>> airlines_xgb = H2OXGBoostEstimator(reg_lambda=.0001, ... seed=1234) >>> airlines_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(airlines_xgb.auc(valid=True)) 
 - 
property response_column¶
- Response variable column. - Type: - str.
 - 
property sample_rate¶
- (same as subsample) Row sample rate per tree (from 0.0 to 1.0) - Type: - float, defaults to- 1.0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], ... seed=1234) >>> airlines_xgb = H2OXGBoostEstimator(sample_rate=.7, ... seed=1234) >>> airlines_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(airlines_xgb.auc(valid=True)) 
 - 
property sample_type¶
- For booster=dart only: sample_type - Type: - Literal["uniform", "weighted"], defaults to- "uniform".- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"]= airlines["Year"].asfactor() >>> airlines["Month"]= airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], ... seed=1234) >>> airlines_xgb = H2OXGBoostEstimator(sample_type="weighted", ... seed=1234) >>> airlines_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(airlines_xgb.auc(valid=True)) 
 - 
property save_matrix_directory¶
- Directory where to save matrices passed to XGBoost library. Useful for debugging. - Type: - str.
 - 
property scale_pos_weight¶
- Controls the effect of observations with positive labels in relation to the observations with negative labels on gradient calculation. Useful for imbalanced problems. - Type: - float, defaults to- 1.0.
 - 
property score_each_iteration¶
- Whether to score during each iteration of model training. - Type: - bool, defaults to- False.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], ... seed=1234) >>> airlines_xgb = H2OXGBoostEstimator(score_each_iteration=True, ... ntrees=55, ... seed=1234) >>> airlines_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_xgb.scoring_history() 
 - 
property score_eval_metric_only¶
- If enabled, score only the evaluation metric. This can make model training faster if scoring is frequent (eg. each iteration). - Type: - bool, defaults to- False.
 - 
property score_tree_interval¶
- Score the model after every so many trees. Disabled if set to 0. - Type: - int, defaults to- 0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], ... seed=1234) >>> airlines_xgb = H2OXGBoostEstimator(score_tree_interval=5, ... seed=1234) >>> airlines_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_xgb.scoring_history() 
 - 
property seed¶
- Seed for pseudo random number generator (if applicable) - Type: - int, defaults to- -1.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> xgb_w_seed_1 = H2OXGBoostEstimator(col_sample_rate=.7, ... seed=1234) >>> xgb_w_seed_1.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> xgb_w_seed_2 = H2OXGBoostEstimator(col_sample_rate = .7, ... seed = 1234) >>> xgb_w_seed_2.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print('auc for the 1st model built with a seed:', ... xgb_w_seed_1.auc(valid=True)) >>> print('auc for the 2nd model built with a seed:', ... xgb_w_seed_2.auc(valid=True)) 
 - 
property skip_drop¶
- For booster=dart only: skip_drop (0..1) - Type: - float, defaults to- 0.0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], ... seed=1234) >>> airlines_xgb = H2OXGBoostEstimator(skip_drop=0.5, ... seed=1234) >>> airlines_xgb.train(x=predictors, ... y=response, ... training_frame=train) >>> airlines_xgb.auc(train=True) 
 - 
property stopping_metric¶
- Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. - Type: - Literal["auto", "deviance", "logloss", "mse", "rmse", "mae", "rmsle", "auc", "aucpr", "lift_top_group", "misclassification", "mean_per_class_error", "custom", "custom_increasing"], defaults to- "auto".- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], seed=1234) >>> airlines_xgb = H2OXGBoostEstimator(stopping_metric="auc", ... stopping_rounds=3, ... stopping_tolerance=1e-2, ... seed=1234) >>> airlines_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_xgb.auc(valid=True) 
 - 
property stopping_rounds¶
- Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) - Type: - int, defaults to- 0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], ... seed=1234) >>> airlines_xgb = H2OXGBoostEstimator(stopping_metric="auc", ... stopping_rounds=3, ... stopping_tolerance=1e-2, ... seed=1234) >>> airlines_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_xgb.auc(valid=True) 
 - 
property stopping_tolerance¶
- Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much) - Type: - float, defaults to- 0.001.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], ... seed=1234) >>> airlines_xgb = H2OXGBoostEstimator(stopping_metric="auc", ... stopping_rounds=3, ... stopping_tolerance=1e-2, ... seed=1234) >>> airlines_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> airlines_xgb.auc(valid=True) 
 - 
property subsample¶
- (same as sample_rate) Row sample rate per tree (from 0.0 to 1.0) - Type: - float, defaults to- 1.0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], ... seed=1234) >>> airlines_xgb = H2OXGBoostEstimator(sample_rate=.7, ... seed=1234) >>> airlines_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(airlines_xgb.auc(valid=True)) 
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> del predictors[1:3] >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], ... seed=1234) >>> titanic_xgb = H2OXGBoostEstimator(seed=1234) >>> titanic_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> titanic_xgb.auc(valid=True) 
 - 
property tree_method¶
- Tree method - Type: - Literal["auto", "exact", "approx", "hist"], defaults to- "auto".- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> airlines["Year"] = airlines["Year"].asfactor() >>> airlines["Month"] = airlines["Month"].asfactor() >>> airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor() >>> airlines["Cancelled"] = airlines["Cancelled"].asfactor() >>> airlines['FlightNum'] = airlines['FlightNum'].asfactor() >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> response = "IsDepDelayed" >>> train, valid= airlines.split_frame(ratios=[.8], ... seed=1234) >>> >>> airlines_xgb = H2OXGBoostEstimator(seed=1234, ... tree_method="approx") >>> airlines_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(airlines_xgb.auc(valid=True)) 
 - 
property tweedie_power¶
- Tweedie power for Tweedie regression, must be between 1 and 2. - Type: - float, defaults to- 1.5.- Examples
 - >>> insurance = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/insurance.csv") >>> predictors = insurance.columns[0:4] >>> response = 'Claims' >>> insurance['Group'] = insurance['Group'].asfactor() >>> insurance['Age'] = insurance['Age'].asfactor() >>> train, valid = insurance.split_frame(ratios=[.8], ... seed=1234) >>> insurance_xgb = H2OXGBoostEstimator(distribution="tweedie", ... tweedie_power=1.2, ... seed=1234) >>> insurance_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(insurance_xgb.mse(valid=True)) 
 - 
property validation_frame¶
- Id of the validation data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> insurance = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/insurance.csv") >>> insurance['Group'] = insurance['Group'].asfactor() >>> insurance['Age'] = insurance['Age'].asfactor() >>> predictors = insurance.columns[0:4] >>> response = 'Claims' >>> train, valid = insurance.split_frame(ratios=[.8], ... seed=1234) >>> insurance_xgb = H2OXGBoostEstimator(seed=1234) >>> insurance_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> print(insurance_xgb.mse(valid=True)) 
 - 
property weights_column¶
- Column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed. Note: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor. If you set weight = 0 for a row, the returned prediction frame at that row is zero and this is incorrect. To get an accurate prediction, remove all rows with weight == 0. - Type: - str.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic['survived'] = titanic['survived'].asfactor() >>> predictors = titanic.columns >>> del predictors[1:3] >>> response = 'survived' >>> train, valid = titanic.split_frame(ratios=[.8], ... seed=1234) >>> titanic_xgb = H2OXGBoostEstimator(seed=1234) >>> titanic_xgb.train(x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> titanic_xgb.auc(valid=True) 
 
- 
property 
Unsupervised¶
H2OAggregatorEstimator¶
- 
class h2o.estimators.aggregator.H2OAggregatorEstimator(model_id=None, training_frame=None, response_column=None, ignored_columns=None, ignore_const_cols=True, target_num_exemplars=5000, rel_tol_num_exemplars=0.5, transform='normalize', categorical_encoding='auto', save_mapping_frame=False, num_iteration_without_new_exemplar=500, export_checkpoints_dir=None)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- Aggregator - 
property categorical_encoding¶
- Encoding scheme for categorical features - Type: - Literal["auto", "enum", "one_hot_internal", "one_hot_explicit", "binary", "eigen", "label_encoder", "sort_by_response", "enum_limited"], defaults to- "auto".- Examples
 - >>> df = h2o.create_frame(rows=10000, ... cols=10, ... categorical_fraction=0.6, ... integer_fraction=0, ... binary_fraction=0, ... real_range=100, ... integer_range=100, ... missing_fraction=0, ... factors=100, ... seed=1234) >>> params = {"target_num_exemplars": 1000, ... "rel_tol_num_exemplars": 0.5, ... "categorical_encoding": "eigen"} >>> agg = H2OAggregatorEstimator(**params) >>> agg.train(training_frame=df) >>> new_df = agg.aggregated_frame >>> new_df 
 - 
property export_checkpoints_dir¶
- Automatically export generated models to this directory. - Type: - str.- Examples
 - >>> import tempfile >>> from os import listdir >>> df = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> checkpoints_dir = tempfile.mkdtemp() >>> model = H2OAggregatorEstimator(target_num_exemplars=500, ... rel_tol_num_exemplars=0.3, ... export_checkpoints_dir=checkpoints_dir) >>> model.train(training_frame=df) >>> new_df = model.aggregated_frame >>> new_df >>> len(listdir(checkpoints_dir)) 
 - 
property ignore_const_cols¶
- Ignore constant columns. - Type: - bool, defaults to- True.- Examples
 - >>> df = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> params = {"ignore_const_cols": False, ... "target_num_exemplars": 500, ... "rel_tol_num_exemplars": 0.3, ... "transform": "standardize", ... "categorical_encoding": "eigen"} >>> model = H2OAggregatorEstimator(**params) >>> model.train(training_frame=df) >>> new_df = model.aggregated_frame >>> new_df 
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property num_iteration_without_new_exemplar¶
- The number of iterations to run before aggregator exits if the number of exemplars collected didn’t change - Type: - int, defaults to- 500.- Examples
 - >>> df = h2o.create_frame(rows=10000, ... cols=10, ... categorical_fraction=0.6, ... integer_fraction=0, ... binary_fraction=0, ... real_range=100, ... integer_range=100, ... missing_fraction=0, ... factors=100, ... seed=1234) >>> params = {"target_num_exemplars": 1000, ... "rel_tol_num_exemplars": 0.5, ... "categorical_encoding": "eigen", ... "num_iteration_without_new_exemplar": 400} >>> agg = H2OAggregatorEstimator(**params) >>> agg.train(training_frame=df) >>> new_df = agg.aggregated_frame >>> new_df 
 - 
property rel_tol_num_exemplars¶
- Relative tolerance for number of exemplars (e.g, 0.5 is +/- 50 percents) - Type: - float, defaults to- 0.5.- Examples
 - >>> df = h2o.create_frame(rows=10000, ... cols=10, ... categorical_fraction=0.6, ... integer_fraction=0, ... binary_fraction=0, ... real_range=100, ... integer_range=100, ... missing_fraction=0, ... factors=100, ... seed=1234) >>> params = {"target_num_exemplars": 1000, ... "rel_tol_num_exemplars": 0.5, ... "categorical_encoding": "eigen", ... "num_iteration_without_new_exemplar": 400} >>> agg = H2OAggregatorEstimator(**params) >>> agg.train(training_frame=df) >>> new_df = agg.aggregated_frame >>> new_df 
 - 
property response_column¶
- Response variable column. - Type: - str.
 - 
property save_mapping_frame¶
- Whether to export the mapping of the aggregated frame - Type: - bool, defaults to- False.- Examples
 - >>> df = h2o.create_frame(rows=10000, ... cols=10, ... categorical_fraction=0.6, ... integer_fraction=0, ... binary_fraction=0, ... real_range=100, ... integer_range=100, ... missing_fraction=0, ... factors=100, ... seed=1234) >>> params = {"target_num_exemplars": 1000, ... "rel_tol_num_exemplars": 0.5, ... "categorical_encoding": "eigen", ... "save_mapping_frame": True} >>> agg = H2OAggregatorEstimator(**params) >>> agg.train(training_frame=df) >>> mapping_frame = agg.mapping_frame >>> mapping_frame 
 - 
property target_num_exemplars¶
- Targeted number of exemplars - Type: - int, defaults to- 5000.- Examples
 - >>> df = h2o.create_frame(rows=10000, ... cols=10, ... categorical_fraction=0.6, ... integer_fraction=0, ... binary_fraction=0, ... real_range=100, ... integer_range=100, ... missing_fraction=0, ... factors=100, ... seed=1234) >>> params = {"target_num_exemplars": 1000, ... "rel_tol_num_exemplars": 0.5, ... "categorical_encoding": "eigen", ... "num_iteration_without_new_exemplar": 400} >>> agg = H2OAggregatorEstimator(**params) >>> agg.train(training_frame=df) >>> new_df = agg.aggregated_frame >>> new_df 
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> df = h2o.create_frame(rows=10000, ... cols=10, ... categorical_fraction=0.6, ... integer_fraction=0, ... binary_fraction=0, ... real_range=100, ... integer_range=100, ... missing_fraction=0, ... factors=100, ... seed=1234) >>> params = {"target_num_exemplars": 1000, ... "rel_tol_num_exemplars": 0.5, ... "categorical_encoding": "eigen", ... "num_iteration_without_new_exemplar": 400} >>> agg = H2OAggregatorEstimator(**params) >>> agg.train(training_frame=df) >>> new_df = agg.aggregated_frame >>> new_df 
 - 
property transform¶
- Transformation of training data - Type: - Literal["none", "standardize", "normalize", "demean", "descale"], defaults to- "normalize".- Examples
 - >>> df = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> params = {"ignore_const_cols": False, ... "target_num_exemplars": 500, ... "rel_tol_num_exemplars": 0.3, ... "transform": "standardize", ... "categorical_encoding": "eigen"} >>> model = H2OAggregatorEstimator(**params) >>> model.train(training_frame=df) >>> new_df = model.aggregated_frame 
 
- 
property 
H2OAutoEncoderEstimator¶
- 
class h2o.estimators.deeplearning.H2OAutoEncoderEstimator(**kwargs)[source]¶
- Bases: - h2o.estimators.deeplearning.H2ODeepLearningEstimator- Examples
 - >>> import h2o as ml >>> from h2o.estimators.deeplearning import H2OAutoEncoderEstimator >>> ml.init() >>> rows = [[1,2,3,4,0]*50, [2,1,2,4,1]*50, [2,1,4,2,1]*50, [0,1,2,34,1]*50, [2,3,4,1,0]*50] >>> fr = ml.H2OFrame(rows) >>> fr[4] = fr[4].asfactor() >>> model = H2OAutoEncoderEstimator() >>> model.train(x=list(range(4)), training_frame=fr) 
H2OExtendedIsolationForestEstimator¶
- 
class h2o.estimators.extended_isolation_forest.H2OExtendedIsolationForestEstimator(model_id=None, training_frame=None, ignored_columns=None, ignore_const_cols=True, categorical_encoding='auto', score_each_iteration=False, score_tree_interval=0, ntrees=100, sample_size=256, extension_level=0, seed=-1, disable_training_metrics=True)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- Extended Isolation Forest - Builds an Extended Isolation Forest model. Extended Isolation Forest generalizes its predecessor algorithm, Isolation Forest. The original Isolation Forest algorithm suffers from bias due to tree branching. Extension of the algorithm mitigates the bias by adjusting the branching, and the original algorithm becomes just a special case. Extended Isolation Forest’s attribute “extension_level” allows leveraging the generalization. The minimum value is 0 and means the Isolation Forest’s behavior. Maximum value is (numCols - 1) and stands for full extension. The rest of the algorithm is analogical to the Isolation Forest algorithm. Each iteration builds a tree that partitions the sample observations’ space until it isolates observation. The length of the path from root to a leaf node of the resulting tree is used to calculate the anomaly score. Anomalies are easier to isolate, and their average tree path is expected to be shorter than paths of regular observations. Anomaly score is a number between 0 and 1. A number closer to 0 is a normal point, and a number closer to 1 is a more anomalous point. - 
property categorical_encoding¶
- Encoding scheme for categorical features - Type: - Literal["auto", "enum", "one_hot_internal", "one_hot_explicit", "binary", "eigen", "label_encoder", "sort_by_response", "enum_limited"], defaults to- "auto".- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> encoding = "one_hot_explicit" >>> airlines_eif = H2OExtendedIsolationForestEstimator(categorical_encoding = encoding, ... seed = 1234) >>> airlines_eif.train(x = predictors, ... training_frame = airlines) >>> airlines_eif.model_performance() 
 - 
property disable_training_metrics¶
- Disable calculating training metrics (expensive on large datasets) - Type: - bool, defaults to- True.
 - 
property extension_level¶
- Maximum is N - 1 (N = numCols). Minimum is 0. Extended Isolation Forest with extension_Level = 0 behaves like Isolation Forest. - Type: - int, defaults to- 0.- Examples
 - >>> train = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/anomaly/single_blob.csv") >>> eif_model = H2OExtendedIsolationForestEstimator(extension_level = 1, ... ntrees=7) >>> eif_model.train(training_frame = train) >>> print(eif_model) 
 - 
property ignore_const_cols¶
- Ignore constant columns. - Type: - bool, defaults to- True.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","acceleration","year","const_1","const_2"] >>> cars["const_1"] = 6 >>> cars["const_2"] = 7 >>> train, valid = cars.split_frame(ratios = [.8], seed = 1234) >>> cars_eif = H2OExtendedIsolationForestEstimator(seed = 1234, ... ignore_const_cols = True) >>> cars_eif.train(x = predictors, ... training_frame = cars) >>> cars_eif.model_performance() 
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property ntrees¶
- Number of Extended Isolation Forest trees. - Type: - int, defaults to- 100.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> predictors = titanic.columns >>> tree_num = [20, 50, 80, 110, 140, 170, 200] >>> label = ["20", "50", "80", "110", "140", "170", "200"] >>> for key, num in enumerate(tree_num): ... titanic_eif = H2OExtendedIsolationForestEstimator(ntrees = num, ... seed = 1234, ... extension_level = titanic.dim[1] - 1) ... titanic_eif.train(x = predictors, ... training_frame = titanic) 
 - 
property sample_size¶
- Number of randomly sampled observations used to train each Extended Isolation Forest tree. - Type: - int, defaults to- 256.- Examples
 - >>> train = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/anomaly/ecg_discord_train.csv") >>> eif_model = H2OExtendedIsolationForestEstimator(sample_size = 5, ... ntrees=7) >>> eif_model.train(training_frame = train) >>> print(eif_model) 
 - 
property score_each_iteration¶
- Whether to score during each iteration of model training. - Type: - bool, defaults to- False.
 - 
property score_tree_interval¶
- Score the model after every so many trees. Disabled if set to 0. - Type: - int, defaults to- 0.
 - 
property seed¶
- Seed for pseudo random number generator (if applicable) - Type: - int, defaults to- -1.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> eif_w_seed = H2OExtendedIsolationForestEstimator(seed = 1234) >>> eif_w_seed.train(x = predictors, ... training_frame = airlines) >>> eif_wo_seed = H2OExtendedIsolationForestEstimator() >>> eif_wo_seed.train(x = predictors, ... training_frame = airlines) >>> print(eif_w_seed) >>> print(eif_wo_seed) 
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","acceleration","year"] >>> cars_eif = H2OExtendedIsolationForestEstimator(seed = 1234, ... sample_size = 256, ... extension_level = cars.dim[1] - 1) >>> cars_eif.train(x = predictors, ... training_frame = cars) >>> print(cars_eif) 
 
- 
property 
H2OGenericEstimator¶
- 
class h2o.estimators.generic.H2OGenericEstimator(model_id=None, model_key=None, path=None)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- Import MOJO Model - 
static from_file(file=<class 'str'>, model_id=None)[source]¶
- Creates new Generic model by loading existing embedded model into library, e.g. from H2O MOJO. The imported model must be supported by H2O. - Parameters
- file – A string containing path to the file to create the model from 
- model_id – Model ID 
 
- Returns
- H2OGenericEstimator instance representing the generic model 
- Examples
 - >>> from h2o.estimators import H2OIsolationForestEstimator, H2OGenericEstimator >>> import tempfile >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/airlines_train.csv") >>> ifr = H2OIsolationForestEstimator(ntrees=1) >>> ifr.train(x=["Origin","Dest"], y="Distance", training_frame=airlines) >>> original_model_filename = tempfile.mkdtemp() >>> original_model_filename = ifr.download_mojo(original_model_filename) >>> model = H2OGenericEstimator.from_file(original_model_filename) >>> model.model_performance() 
 - 
property model_key¶
- Key to the self-contained model archive already uploaded to H2O. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> from h2o.estimators import H2OGenericEstimator, H2OXGBoostEstimator >>> import tempfile >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/airlines_train.csv") >>> y = "IsDepDelayed" >>> x = ["fYear","fMonth","Origin","Dest","Distance"] >>> xgb = H2OXGBoostEstimator(ntrees=1, nfolds=3) >>> xgb.train(x=x, y=y, training_frame=airlines) >>> original_model_filename = tempfile.mkdtemp() >>> original_model_filename = xgb.download_mojo(original_model_filename) >>> key = h2o.lazy_import(original_model_filename) >>> fr = h2o.get_frame(key[0]) >>> model = H2OGenericEstimator(model_key=fr) >>> model.train() >>> model.auc() 
 - 
property path¶
- Path to file with self-contained model archive. - Type: - str.- Examples
 - >>> from h2o.estimators import H2OIsolationForestEstimator, H2OGenericEstimator >>> import tempfile >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/testng/airlines_train.csv") >>> ifr = H2OIsolationForestEstimator(ntrees=1) >>> ifr.train(x=["Origin","Dest"], y="Distance", training_frame=airlines) >>> generic_mojo_filename = tempfile.mkdtemp("zip","genericMojo") >>> generic_mojo_filename = model.download_mojo(path=generic_mojo_filename) >>> model = H2OGenericEstimator.from_file(generic_mojo_filename) >>> model.model_performance() 
 
- 
static 
H2OGeneralizedLowRankEstimator¶
- 
class h2o.estimators.glrm.H2OGeneralizedLowRankEstimator(model_id=None, training_frame=None, validation_frame=None, ignored_columns=None, ignore_const_cols=True, score_each_iteration=False, representation_name=None, loading_name=None, transform='none', k=1, loss='quadratic', loss_by_col=None, loss_by_col_idx=None, multi_loss='categorical', period=1, regularization_x='none', regularization_y='none', gamma_x=0.0, gamma_y=0.0, max_iterations=1000, max_updates=2000, init_step_size=1.0, min_step_size=0.0001, seed=-1, init='plus_plus', svd_method='randomized', user_y=None, user_x=None, expand_user_y=True, impute_original=False, recover_svd=False, max_runtime_secs=0.0, export_checkpoints_dir=None)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- Generalized Low Rank Modeling - Builds a generalized low rank model of a H2O dataset. - 
property expand_user_y¶
- Expand categorical columns in user-specified initial Y - Type: - bool, defaults to- True.- Examples
 - >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") >>> rank = 3 >>> gx = 0.5 >>> gy = 0.5 >>> trans = "standardize" >>> iris_glrm = H2OGeneralizedLowRankEstimator(k=rank, ... loss="Quadratic", ... gamma_x=gx, ... gamma_y=gy, ... transform=trans, ... expand_user_y=False) >>> iris_glrm.train(x=iris.names, training_frame=iris) >>> iris_glrm.show() 
 - 
property export_checkpoints_dir¶
- Automatically export generated models to this directory. - Type: - str.- Examples
 - >>> import tempfile >>> from os import listdir >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") >>> checkpoints_dir = tempfile.mkdtemp() >>> iris_glrm = H2OGeneralizedLowRankEstimator(k=3, ... export_checkpoints_dir=checkpoints_dir, ... seed=1234) >>> iris_glrm.train(x=iris.names, training_frame=iris) >>> len(listdir(checkpoints_dir)) 
 - 
property gamma_x¶
- Regularization weight on X matrix - Type: - float, defaults to- 0.0.- Examples
 - >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") >>> rank = 3 >>> gx = 0.5 >>> gy = 0.5 >>> trans = "standardize" >>> iris_glrm = H2OGeneralizedLowRankEstimator(k=rank, ... loss="Quadratic", ... gamma_x=gx, ... gamma_y=gy, ... transform=trans) >>> iris_glrm.train(x=iris.names, training_frame=iris) >>> iris_glrm.show() 
 - 
property gamma_y¶
- Regularization weight on Y matrix - Type: - float, defaults to- 0.0.- Examples
 - >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") >>> rank = 3 >>> gx = 0.5 >>> gy = 0.5 >>> trans = "standardize" >>> iris_glrm = H2OGeneralizedLowRankEstimator(k=rank, ... loss="Quadratic", ... gamma_x=gx, ... gamma_y=gy, ... transform=trans) >>> iris_glrm.train(x=iris.names, training_frame=iris) >>> iris_glrm.show() 
 - 
property ignore_const_cols¶
- Ignore constant columns. - Type: - bool, defaults to- True.- Examples
 - >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") >>> iris_glrm = H2OGeneralizedLowRankEstimator(k=3, ... ignore_const_cols=False, ... seed=1234) >>> iris_glrm.train(x=iris.names, training_frame=iris) >>> iris_glrm.show() 
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property impute_original¶
- Reconstruct original training data by reversing transform - Type: - bool, defaults to- False.- Examples
 - >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") >>> rank = 3 >>> gx = 0.5 >>> gy = 0.5 >>> trans = "standardize" >>> iris_glrm = H2OGeneralizedLowRankEstimator(k=rank, ... loss="Quadratic", ... gamma_x=gx, ... gamma_y=gy, ... transform=trans ... impute_original=True) >>> iris_glrm.train(x=iris.names, training_frame=iris) >>> iris_glrm.show() 
 - 
property init¶
- Initialization mode - Type: - Literal["random", "svd", "plus_plus", "user"], defaults to- "plus_plus".- Examples
 - >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") >>> iris_glrm = H2OGeneralizedLowRankEstimator(k=3, ... init="svd", ... seed=1234) >>> iris_glrm.train(x=iris.names, training_frame=iris) >>> iris_glrm.show() 
 - 
property init_step_size¶
- Initial step size - Type: - float, defaults to- 1.0.- Examples
 - >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") >>> iris_glrm = H2OGeneralizedLowRankEstimator(k=3, ... init_step_size=2.5, ... seed=1234) >>> iris_glrm.train(x=iris.names, training_frame=iris) >>> iris_glrm.show() 
 - 
property k¶
- Rank of matrix approximation - Type: - int, defaults to- 1.- Examples
 - >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") >>> iris_glrm = H2OGeneralizedLowRankEstimator(k=3) >>> iris_glrm.train(x=iris.names, training_frame=iris) >>> iris_glrm.show() 
 - 
property loading_name¶
- [Deprecated] Use representation_name instead. Frame key to save resulting X. - Type: - str.- Examples
 - >>> # loading_name will be deprecated. Use representation_name instead. >>> acs = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/bigdata/laptop/census/ACS_13_5YR_DP02_cleaned.zip") >>> acs_fill = acs.drop("ZCTA5") >>> acs_glrm = H2OGeneralizedLowRankEstimator(k=10, ... transform="standardize", ... loss="quadratic", ... regularization_x="quadratic", ... regularization_y="L1", ... gamma_x=0.25, ... gamma_y=0.5, ... max_iterations=1, ... loading_name="acs_full") >>> acs_glrm.train(x=acs_fill.names, training_frame=acs) >>> acs_glrm.loading_name >>> acs_glrm.show() 
 - 
property loss¶
- Numeric loss function - Type: - Literal["quadratic", "absolute", "huber", "poisson", "hinge", "logistic", "periodic"], defaults to- "quadratic".- Examples
 - >>> acs = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/bigdata/laptop/census/ACS_13_5YR_DP02_cleaned.zip") >>> acs_fill = acs.drop("ZCTA5") >>> acs_glrm = H2OGeneralizedLowRankEstimator(k=10, ... transform="standardize", ... loss="absolute", ... regularization_x="quadratic", ... regularization_y="L1", ... gamma_x=0.25, ... gamma_y=0.5, ... max_iterations=700) >>> acs_glrm.train(x=acs_fill.names, training_frame=acs) >>> acs_glrm.show() 
 - 
property loss_by_col¶
- Loss function by column (override) - Type: - List[Literal["quadratic", "absolute", "huber", "poisson", "hinge", "logistic", "periodic", "categorical", "ordinal"]].- Examples
 - >>> arrestsH2O = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/pca_test/USArrests.csv") >>> arrests_glrm = H2OGeneralizedLowRankEstimator(k=3, ... loss="quadratic", ... loss_by_col=["absolute","huber"], ... loss_by_col_idx=[0,3], ... regularization_x="quadratic", ... regularization_y="l1") >>> arrests_glrm.train(x=arrestsH2O.names, training_frame=arrestsH2O) >>> arrests_glrm.show() 
 - 
property loss_by_col_idx¶
- Loss function by column index (override) - Type: - List[int].- Examples
 - >>> arrestsH2O = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/pca_test/USArrests.csv") >>> arrests_glrm = H2OGeneralizedLowRankEstimator(k=3, ... loss="quadratic", ... loss_by_col=["absolute","huber"], ... loss_by_col_idx=[0,3], ... regularization_x="quadratic", ... regularization_y="l1") >>> arrests_glrm.train(x=arrestsH2O.names, training_frame=arrestsH2O) >>> arrests_glrm.show() 
 - 
property max_iterations¶
- Maximum number of iterations - Type: - int, defaults to- 1000.- Examples
 - >>> acs = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/bigdata/laptop/census/ACS_13_5YR_DP02_cleaned.zip") >>> acs_fill = acs.drop("ZCTA5") >>> acs_glrm = H2OGeneralizedLowRankEstimator(k=10, ... transform="standardize", ... loss="quadratic", ... regularization_x="quadratic", ... regularization_y="L1", ... gamma_x=0.25, ... gamma_y=0.5, ... max_iterations=700) >>> acs_glrm.train(x=acs_fill.names, training_frame=acs) >>> acs_glrm.show() 
 - 
property max_runtime_secs¶
- Maximum allowed runtime in seconds for model training. Use 0 to disable. - Type: - float, defaults to- 0.0.- Examples
 - >>> arrestsH2O = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/pca_test/USArrests.csv") >>> arrests_glrm = H2OGeneralizedLowRankEstimator(k=3, ... max_runtime_secs=15, ... max_iterations=500, ... max_updates=900, ... min_step_size=0.005) >>> arrests_glrm.train(x=arrestsH2O.names, training_frame=arrestsH2O) >>> arrests_glrm.show() 
 - 
property max_updates¶
- Maximum number of updates, defaults to 2*max_iterations - Type: - int, defaults to- 2000.- Examples
 - >>> arrestsH2O = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/pca_test/USArrests.csv") >>> arrests_glrm = H2OGeneralizedLowRankEstimator(k=3, ... max_runtime_secs=15, ... max_iterations=500, ... max_updates=900, ... min_step_size=0.005) >>> arrests_glrm.train(x=arrestsH2O.names, training_frame=arrestsH2O) >>> arrests_glrm.show() 
 - 
property min_step_size¶
- Minimum step size - Type: - float, defaults to- 0.0001.- Examples
 - >>> arrestsH2O = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/pca_test/USArrests.csv") >>> arrests_glrm = H2OGeneralizedLowRankEstimator(k=3, ... max_runtime_secs=15, ... max_iterations=500, ... max_updates=900, ... min_step_size=0.005) >>> arrests_glrm.train(x=arrestsH2O.names, training_frame=arrestsH2O) >>> arrests_glrm.show() 
 - 
property multi_loss¶
- Categorical loss function - Type: - Literal["categorical", "ordinal"], defaults to- "categorical".- Examples
 - >>> arrestsH2O = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/pca_test/USArrests.csv") >>> arrests_glrm = H2OGeneralizedLowRankEstimator(k=3, ... loss="quadratic", ... loss_by_col=["absolute","huber"], ... loss_by_col_idx=[0,3], ... regularization_x="quadratic", ... regularization_y="l1" ... multi_loss="ordinal") >>> arrests_glrm.train(x=arrestsH2O.names, training_frame=arrestsH2O) >>> arrests_glrm.show() 
 - 
property period¶
- Length of period (only used with periodic loss function) - Type: - int, defaults to- 1.- Examples
 - >>> arrestsH2O = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/pca_test/USArrests.csv") >>> arrests_glrm = H2OGeneralizedLowRankEstimator(k=3, ... max_runtime_secs=15, ... max_iterations=500, ... max_updates=900, ... min_step_size=0.005, ... period=5) >>> arrests_glrm.train(x=arrestsH2O.names, training_frame=arrestsH2O) >>> arrests_glrm.show() 
 - 
property recover_svd¶
- Recover singular values and eigenvectors of XY - Type: - bool, defaults to- False.- Examples
 - >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate_cat.csv") >>> prostate[0] = prostate[0].asnumeric() >>> prostate[4] = prostate[4].asnumeric() >>> loss_all = ["Hinge", "Quadratic", "Categorical", "Categorical", ... "Hinge", "Quadratic", "Quadratic", "Quadratic"] >>> pros_glrm = H2OGeneralizedLowRankEstimator(k=5, ... loss_by_col=loss_all, ... recover_svd=True, ... transform="standardize", ... seed=12345) >>> pros_glrm.train(x=prostate.names, training_frame=prostate) >>> pros_glrm.show() 
 - 
property regularization_x¶
- Regularization function for X matrix - Type: - Literal["none", "quadratic", "l2", "l1", "non_negative", "one_sparse", "unit_one_sparse", "simplex"], defaults to- "none".- Examples
 - >>> arrestsH2O = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/pca_test/USArrests.csv") >>> arrests_glrm = H2OGeneralizedLowRankEstimator(k=3, ... loss="quadratic", ... loss_by_col=["absolute","huber"], ... loss_by_col_idx=[0,3], ... regularization_x="quadratic", ... regularization_y="l1") >>> arrests_glrm.train(x=arrestsH2O.names, training_frame=arrestsH2O) >>> arrests_glrm.show() 
 - 
property regularization_y¶
- Regularization function for Y matrix - Type: - Literal["none", "quadratic", "l2", "l1", "non_negative", "one_sparse", "unit_one_sparse", "simplex"], defaults to- "none".- Examples
 - >>> arrestsH2O = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/pca_test/USArrests.csv") >>> arrests_glrm = H2OGeneralizedLowRankEstimator(k=3, ... loss="quadratic", ... loss_by_col=["absolute","huber"], ... loss_by_col_idx=[0,3], ... regularization_x="quadratic", ... regularization_y="l1") >>> arrests_glrm.train(x=arrestsH2O.names, training_frame=arrestsH2O) >>> arrests_glrm.show() 
 - 
property representation_name¶
- Frame key to save resulting X - Type: - str.- Examples
 - >>> acs = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/bigdata/laptop/census/ACS_13_5YR_DP02_cleaned.zip") >>> acs_fill = acs.drop("ZCTA5") >>> acs_glrm = H2OGeneralizedLowRankEstimator(k=10, ... transform="standardize", ... loss="quadratic", ... regularization_x="quadratic", ... regularization_y="L1", ... gamma_x=0.25, ... gamma_y=0.5, ... max_iterations=1, ... representation_name="acs_full") >>> acs_glrm.train(x=acs_fill.names, training_frame=acs) >>> acs_glrm.loading_name >>> acs_glrm.show() 
 - 
property score_each_iteration¶
- Whether to score during each iteration of model training. - Type: - bool, defaults to- False.- Examples
 - >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate_cat.csv") >>> prostate[0] = prostate[0].asnumeric() >>> prostate[4] = prostate[4].asnumeric() >>> loss_all = ["Hinge", "Quadratic", "Categorical", "Categorical", ... "Hinge", "Quadratic", "Quadratic", "Quadratic"] >>> pros_glrm = H2OGeneralizedLowRankEstimator(k=5, ... loss_by_col=loss_all, ... score_each_iteration=True, ... transform="standardize", ... seed=12345) >>> pros_glrm.train(x=prostate.names, training_frame=prostate) >>> pros_glrm.show() 
 - 
property seed¶
- RNG seed for initialization - Type: - int, defaults to- -1.- Examples
 - >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate_cat.csv") >>> prostate[0] = prostate[0].asnumeric() >>> prostate[4] = prostate[4].asnumeric() >>> glrm_w_seed = H2OGeneralizedLowRankEstimator(k=5, seed=12345) >>> glrm_w_seed.train(x=prostate.names, training_frame=prostate) >>> glrm_wo_seed = H2OGeneralizedLowRankEstimator(k=5, >>> glrm_wo_seed.train(x=prostate.names, training_frame=prostate) >>> glrm_w_seed.show() >>> glrm_wo_seed.show() 
 - 
property svd_method¶
- Method for computing SVD during initialization (Caution: Randomized is currently experimental and unstable) - Type: - Literal["gram_s_v_d", "power", "randomized"], defaults to- "randomized".- Examples
 - >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate_cat.csv") >>> prostate[0] = prostate[0].asnumeric() >>> prostate[4] = prostate[4].asnumeric() >>> pros_glrm = H2OGeneralizedLowRankEstimator(k=5, ... svd_method="power", ... seed=1234) >>> pros_glrm.train(x=prostate.names, training_frame=prostate) >>> pros_glrm.show() 
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate_cat.csv") >>> prostate[0] = prostate[0].asnumeric() >>> prostate[4] = prostate[4].asnumeric() >>> pros_glrm = H2OGeneralizedLowRankEstimator(k=5, ... seed=1234) >>> pros_glrm.train(x=prostate.names, training_frame=prostate) >>> pros_glrm.show() 
 - 
property transform¶
- Transformation of training data - Type: - Literal["none", "standardize", "normalize", "demean", "descale"], defaults to- "none".- Examples
 - >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate_cat.csv") >>> prostate[0] = prostate[0].asnumeric() >>> prostate[4] = prostate[4].asnumeric() >>> pros_glrm = H2OGeneralizedLowRankEstimator(k=5, ... score_each_iteration=True, ... transform="standardize", ... seed=12345) >>> pros_glrm.train(x=prostate.names, training_frame=prostate) >>> pros_glrm.show() 
 - 
transform_frame(fr)[source]¶
- GLRM performs A=X*Y during training. When a new dataset is given, GLRM will perform Anew = Xnew*Y. When predict is called, Xnew*Y is returned. When transform_frame is called, Xnew is returned instead. :return: an H2OFrame that contains Xnew. 
 - 
property user_x¶
- User-specified initial X - Type: - Union[None, str, H2OFrame].- Examples
 - >>> arrestsH2O = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/USArrests.csv") >>> initial_x = ([[5.412, 65.24, -7.54, -0.032, 2.212, 92.24, -17.54, 23.268, 0.312, ... 123.24, 14.46, 9.768, 1.012, 19.24, -15.54, -1.732, 5.412, 65.24, ... -7.54, -0.032, 2.212, 92.24, -17.54, 23.268, 0.312, 123.24, 14.46, ... 9.76, 1.012, 19.24, -15.54, -1.732, 5.412, 65.24, -7.54, -0.032, ... 2.212, 92.24, -17.54, 23.268, 0.312, 123.24, 14.46, 9.768, 1.012, ... 19.24, -15.54, -1.732, 5.412, 65.24]]*4) >>> initial_x_h2o = h2o.H2OFrame(list(zip(*initial_x))) >>> arrests_glrm = H2OGeneralizedLowRankEstimator(k=4, ... transform="demean", ... loss="quadratic", ... gamma_x=0.5, ... gamma_y=0.3, ... init="user", ... user_x=initial_x_h2o, ... recover_svd=True) >>> arrests_glrm.train(x=arrestsH2O.names, training_frame=arrestsH2O) >>> arrests_glrm.show() 
 - 
property user_y¶
- User-specified initial Y - Type: - Union[None, str, H2OFrame].- Examples
 - >>> arrestsH2O = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/USArrests.csv") >>> initial_y = [[5.412, 65.24, -7.54, -0.032], ... [2.212, 92.24, -17.54, 23.268], ... [0.312, 123.24, 14.46, 9.768], ... [1.012, 19.24, -15.54, -1.732]] >>> initial_y_h2o = h2o.H2OFrame(list(zip(*initial_y))) >>> arrests_glrm = H2OGeneralizedLowRankEstimator(k=4, ... transform="demean", ... loss="quadratic", ... gamma_x=0.5, ... gamma_y=0.3, ... init="user", ... user_y=initial_y_h2o, ... recover_svd=True) >>> arrests_glrm.train(x=arrestsH2O.names, training_frame=arrestsH2O) >>> arrests_glrm.show() 
 - 
property validation_frame¶
- Id of the validation data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> iris = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_wheader.csv") >>> iris_glrm = H2OGeneralizedLowRankEstimator(k=3, ... loss="quadratic", ... gamma_x=0.5, ... gamma_y=0.5, ... transform="standardize") >>> iris_glrm.train(x=iris.names, ... training_frame=iris, ... validation_frame=iris) >>> iris_glrm.show() 
 
- 
property 
H2OIsolationForestEstimator¶
- 
class h2o.estimators.isolation_forest.H2OIsolationForestEstimator(model_id=None, training_frame=None, score_each_iteration=False, score_tree_interval=0, ignored_columns=None, ignore_const_cols=True, ntrees=50, max_depth=8, min_rows=1.0, max_runtime_secs=0.0, seed=-1, build_tree_one_node=False, mtries=-1, sample_size=256, sample_rate=-1.0, col_sample_rate_change_per_level=1.0, col_sample_rate_per_tree=1.0, categorical_encoding='auto', stopping_rounds=0, stopping_metric='auto', stopping_tolerance=0.01, export_checkpoints_dir=None, contamination=-1.0, validation_frame=None, validation_response_column=None)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- Isolation Forest - Builds an Isolation Forest model. Isolation Forest algorithm samples the training frame and in each iteration builds a tree that partitions the space of the sample observations until it isolates each observation. Length of the path from root to a leaf node of the resulting tree is used to calculate the anomaly score. Anomalies are easier to isolate and their average tree path is expected to be shorter than paths of regular observations. - 
property build_tree_one_node¶
- Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets. - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","acceleration","year"] >>> cars_if = H2OIsolationForestEstimator(build_tree_one_node=True, ... seed=1234) >>> cars_if.train(x=predictors, ... training_frame=cars) >>> cars_if.model_performance() 
 - 
property categorical_encoding¶
- Encoding scheme for categorical features - Type: - Literal["auto", "enum", "one_hot_internal", "one_hot_explicit", "binary", "eigen", "label_encoder", "sort_by_response", "enum_limited"], defaults to- "auto".- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> encoding = "one_hot_explicit" >>> airlines_if = H2OIsolationForestEstimator(categorical_encoding=encoding, ... seed=1234) >>> airlines_if.train(x=predictors, ... training_frame=airlines) >>> airlines_if.model_performance() 
 - 
property col_sample_rate_change_per_level¶
- Relative change of the column sampling rate for every level (must be > 0.0 and <= 2.0) - Type: - float, defaults to- 1.0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> airlines_if = H2OIsolationForestEstimator(col_sample_rate_change_per_level=.9, ... seed=1234) >>> airlines_if.train(x=predictors, ... training_frame=airlines) >>> airlines_if.model_performance() 
 - 
property col_sample_rate_per_tree¶
- Column sample rate per tree (from 0.0 to 1.0) - Type: - float, defaults to- 1.0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> airlines_if = H2OIsolationForestEstimator(col_sample_rate_per_tree=.7, ... seed=1234) >>> airlines_if.train(x=predictors, ... training_frame=airlines) >>> airlines_if.model_performance() 
 - 
property contamination¶
- Contamination ratio - the proportion of anomalies in the input dataset. If undefined (-1) the predict function will not mark observations as anomalies and only anomaly score will be returned. Defaults to -1 (undefined). - Type: - float, defaults to- -1.0.
 - 
property export_checkpoints_dir¶
- Automatically export generated models to this directory. - Type: - str.- Examples
 - >>> import tempfile >>> from os import listdir >>> airlines = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip", destination_frame="air.hex") >>> predictors = ["DayofMonth", "DayOfWeek"] >>> checkpoints_dir = tempfile.mkdtemp() >>> air_if = H2OIsolationForestEstimator(max_depth=3, ... seed=1234, ... export_checkpoints_dir=checkpoints_dir) >>> air_if.train(x=predictors, ... training_frame=airlines) >>> len(listdir(checkpoints_dir)) 
 - 
property ignore_const_cols¶
- Ignore constant columns. - Type: - bool, defaults to- True.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","acceleration","year","const_1","const_2"] >>> cars["const_1"] = 6 >>> cars["const_2"] = 7 >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_if = H2OIsolationForestEstimator(seed=1234, ... ignore_const_cols=True) >>> cars_if.train(x=predictors, ... training_frame=cars) >>> cars_if.model_performance() 
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property max_depth¶
- Maximum tree depth (0 for unlimited). - Type: - int, defaults to- 8.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","acceleration","year"] >>> cars_if = H2OIsolationForestEstimator(max_depth=2, ... seed=1234) >>> cars_if.train(x=predictors, ... training_frame=cars) >>> cars_if.model_performance() 
 - 
property max_runtime_secs¶
- Maximum allowed runtime in seconds for model training. Use 0 to disable. - Type: - float, defaults to- 0.0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","acceleration","year"] >>> cars_if = H2OIsolationForestEstimator(max_runtime_secs=10, ... ntrees=10000, ... max_depth=10, ... seed=1234) >>> cars_if.train(x=predictors, ... training_frame=cars) >>> cars_if.model_performance() 
 - 
property min_rows¶
- Fewest allowed (weighted) observations in a leaf. - Type: - float, defaults to- 1.0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","acceleration","year"] >>> cars_if = H2OIsolationForestEstimator(min_rows=16, ... seed=1234) >>> cars_if.train(x=predictors, ... training_frame=cars) >>> cars_if.model_performance() 
 - 
property mtries¶
- Number of variables randomly sampled as candidates at each split. If set to -1, defaults (number of predictors)/3. - Type: - int, defaults to- -1.- Examples
 - >>> covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data") >>> predictors = covtype.columns[0:54] >>> cov_if = H2OIsolationForestEstimator(mtries=30, seed=1234) >>> cov_if.train(x=predictors, ... training_frame=covtype) >>> cov_if.model_performance() 
 - 
property ntrees¶
- Number of trees. - Type: - int, defaults to- 50.- Examples
 - >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> predictors = titanic.columns >>> tree_num = [20, 50, 80, 110, 140, 170, 200] >>> label = ["20", "50", "80", "110", "140", "170", "200"] >>> for key, num in enumerate(tree_num): ... titanic_if = H2OIsolationForestEstimator(ntrees=num, ... seed=1234) ... titanic_if.train(x=predictors, ... training_frame=titanic) ... print(label[key], 'training score', titanic_if.mse(train=True)) 
 - 
property sample_rate¶
- Rate of randomly sampled observations used to train each Isolation Forest tree. Needs to be in range from 0.0 to 1.0. If set to -1, sample_rate is disabled and sample_size will be used instead. - Type: - float, defaults to- -1.0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> airlines_if = H2OIsolationForestEstimator(sample_rate=.7, ... seed=1234) >>> airlines_if.train(x=predictors, ... training_frame=airlines) >>> airlines_if.model_performance() 
 - 
property sample_size¶
- Number of randomly sampled observations used to train each Isolation Forest tree. Only one of parameters sample_size and sample_rate should be defined. If sample_rate is defined, sample_size will be ignored. - Type: - int, defaults to- 256.- Examples
 - >>> train = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/anomaly/ecg_discord_train.csv") >>> isofor_model = H2OIsolationForestEstimator(sample_size=5, ... ntrees=7) >>> isofor_model.train(training_frame=train) >>> isofor_model.model_performance() 
 - 
property score_each_iteration¶
- Whether to score during each iteration of model training. - Type: - bool, defaults to- False.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","acceleration","year"] >>> cars_if = H2OIsolationForestEstimator(score_each_iteration=True, ... ntrees=55, ... seed=1234) >>> cars_if.train(x=predictors, ... training_frame=cars) >>> cars_if.model_performance() 
 - 
property score_tree_interval¶
- Score the model after every so many trees. Disabled if set to 0. - Type: - int, defaults to- 0.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","acceleration","year"] >>> cars_if = H2OIsolationForestEstimator(score_tree_interval=5, ... seed=1234) >>> cars_if.train(x=predictors, ... training_frame=cars) >>> cars_if.model_performance() 
 - 
property seed¶
- Seed for pseudo random number generator (if applicable) - Type: - int, defaults to- -1.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> isofor_w_seed = H2OIsolationForestEstimator(seed=1234) >>> isofor_w_seed.train(x=predictors, ... training_frame=airlines) >>> isofor_wo_seed = H2OIsolationForestEstimator() >>> isofor_wo_seed.train(x=predictors, ... training_frame=airlines) >>> isofor_w_seed.model_performance() >>> isofor_wo_seed.model_performance() 
 - 
property stopping_metric¶
- Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. - Type: - Literal["auto", "anomaly_score", "deviance", "logloss", "mse", "rmse", "mae", "rmsle", "auc", "aucpr", "misclassification", "mean_per_class_error"], defaults to- "auto".- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> airlines_if = H2OIsolationForestEstimator(stopping_metric="auto", ... stopping_rounds=3, ... stopping_tolerance=1e-2, ... seed=1234) >>> airlines_if.train(x=predictors, ... training_frame=airlines) >>> airlines_if.model_performance() 
 - 
property stopping_rounds¶
- Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) - Type: - int, defaults to- 0.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> airlines_if = H2OIsolationForestEstimator(stopping_metric="auto", ... stopping_rounds=3, ... stopping_tolerance=1e-2, ... seed=1234) >>> airlines_if.train(x=predictors, ... training_frame=airlines) >>> airlines_if.model_performance() 
 - 
property stopping_tolerance¶
- Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much) - Type: - float, defaults to- 0.01.- Examples
 - >>> airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") >>> predictors = ["Origin", "Dest", "Year", "UniqueCarrier", ... "DayOfWeek", "Month", "Distance", "FlightNum"] >>> airlines_if = H2OIsolationForestEstimator(stopping_metric="auto", ... stopping_rounds=3, ... stopping_tolerance=1e-2, ... seed=1234) >>> airlines_if.train(x=predictors, ... training_frame=airlines) >>> airlines_if.model_performance() 
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","acceleration","year"] >>> cars_if = H2OIsolationForestEstimator(seed=1234) >>> cars_if.train(x=predictors, ... training_frame=cars) >>> cars_if.model_performance() 
 - 
property validation_frame¶
- Id of the validation data frame. - Type: - Union[None, str, H2OFrame].
 - 
property validation_response_column¶
- (experimental) Name of the response column in the validation frame. Response column should be binary and indicate not anomaly/anomaly. - Type: - str.
 
- 
property 
H2OKMeansEstimator¶
- 
class h2o.estimators.kmeans.H2OKMeansEstimator(model_id=None, training_frame=None, validation_frame=None, nfolds=0, keep_cross_validation_models=True, keep_cross_validation_predictions=False, keep_cross_validation_fold_assignment=False, fold_assignment='auto', fold_column=None, ignored_columns=None, ignore_const_cols=True, score_each_iteration=False, k=1, estimate_k=False, user_points=None, max_iterations=10, standardize=True, seed=-1, init='furthest', max_runtime_secs=0.0, categorical_encoding='auto', export_checkpoints_dir=None, cluster_size_constraints=None)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- K-means - Performs k-means clustering on an H2O dataset. - 
property categorical_encoding¶
- Encoding scheme for categorical features - Type: - Literal["auto", "enum", "one_hot_internal", "one_hot_explicit", "binary", "eigen", "label_encoder", "sort_by_response", "enum_limited"], defaults to- "auto".- Examples
 - >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv") >>> predictors = ["AGE", "RACE", "DPROS", "DCAPS", "PSA", "VOL", "GLEASON"] >>> train, valid = prostate.split_frame(ratios=[.8], seed=1234) >>> encoding = "one_hot_explicit" >>> pros_km = H2OKMeansEstimator(categorical_encoding=encoding, ... seed=1234) >>> pros_km.train(x=predictors, ... training_frame=train, ... validation_frame=valid) >>> pros_km.scoring_history() 
 - 
property cluster_size_constraints¶
- An array specifying the minimum number of points that should be in each cluster. The length of the constraints array has to be the same as the number of clusters. - Type: - List[int].- Examples
 - >>> iris_h2o = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris.csv") >>> k=3 >>> start_points = h2o.H2OFrame( ... [[4.9, 3.0, 1.4, 0.2], ... [5.6, 2.5, 3.9, 1.1], ... [6.5, 3.0, 5.2, 2.0]]) >>> kmm = H2OKMeansEstimator(k=k, ... user_points=start_points, ... standardize=True, ... cluster_size_constraints=[2, 5, 8], ... score_each_iteration=True) >>> kmm.train(x=list(range(7)), training_frame=iris_h2o) >>> kmm.scoring_history() 
 - 
property estimate_k¶
- Whether to estimate the number of clusters (<=k) iteratively and deterministically. - Type: - bool, defaults to- False.- Examples
 - >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") >>> iris['class'] = iris['class'].asfactor() >>> predictors = iris.columns[:-1] >>> train, valid = iris.split_frame(ratios=[.8], seed=1234) >>> iris_kmeans = H2OKMeansEstimator(k=10, ... estimate_k=True, ... standardize=False, ... seed=1234) >>> iris_kmeans.train(x=predictors, ... training_frame=train, ... validation_frame=valid) >>> iris_kmeans.scoring_history() 
 - 
property export_checkpoints_dir¶
- Automatically export generated models to this directory. - Type: - str.- Examples
 - >>> import tempfile >>> from os import listdir >>> airlines = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip", destination_frame="air.hex") >>> predictors = ["DayofMonth", "DayOfWeek"] >>> checkpoints_dir = tempfile.mkdtemp() >>> air_km = H2OKMeansEstimator(export_checkpoints_dir=checkpoints_dir, ... seed=1234) >>> air_km.train(x=predictors, training_frame=airlines) >>> len(listdir(checkpoints_dir)) 
 - 
property fold_assignment¶
- Cross-validation fold assignment scheme, if fold_column is not specified. The ‘Stratified’ option will stratify the folds based on the response variable, for classification problems. - Type: - Literal["auto", "random", "modulo", "stratified"], defaults to- "auto".- Examples
 - >>> ozone = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/ozone.csv") >>> predictors = ["radiation","temperature","wind"] >>> train, valid = ozone.split_frame(ratios=[.8], seed=1234) >>> ozone_km = H2OKMeansEstimator(fold_assignment="Random", ... nfolds=5, ... seed=1234) >>> ozone_km.train(x=predictors, ... training_frame=train, ... validation_frame=valid) >>> ozone_km.scoring_history() 
 - 
property fold_column¶
- Column with cross-validation fold index assignment per observation. - Type: - str.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","acceleration","year"] >>> fold_numbers = cars.kfold_column(n_folds=5, seed=1234) >>> fold_numbers.set_names(["fold_numbers"]) >>> cars = cars.cbind(fold_numbers) >>> print(cars['fold_numbers']) >>> cars_km = H2OKMeansEstimator(seed=1234) >>> cars_km.train(x=predictors, ... training_frame=cars, ... fold_column="fold_numbers") >>> cars_km.scoring_history() 
 - 
property ignore_const_cols¶
- Ignore constant columns. - Type: - bool, defaults to- True.- Examples
 - >>> cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv") >>> predictors = ["displacement","power","weight","acceleration","year"] >>> cars["const_1"] = 6 >>> cars["const_2"] = 7 >>> train, valid = cars.split_frame(ratios=[.8], seed=1234) >>> cars_km = H2OKMeansEstimator(ignore_const_cols=True, ... seed=1234) >>> cars_km.train(x=predictors, ... training_frame=train, ... validation_frame=valid) >>> cars_km.scoring_history() 
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property init¶
- Initialization mode - Type: - Literal["random", "plus_plus", "furthest", "user"], defaults to- "furthest".- Examples
 - >>> seeds = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/flow_examples/seeds_dataset.txt") >>> predictors = seeds.columns[0:7] >>> train, valid = seeds.split_frame(ratios=[.8], seed=1234) >>> seeds_km = H2OKMeansEstimator(k=3, ... init='Furthest', ... seed=1234) >>> seeds_km.train(x=predictors, ... training_frame=train, ... validation_frame= valid) >>> seeds_km.scoring_history() 
 - 
property k¶
- The max. number of clusters. If estimate_k is disabled, the model will find k centroids, otherwise it will find up to k centroids. - Type: - int, defaults to- 1.- Examples
 - >>> seeds = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/flow_examples/seeds_dataset.txt") >>> predictors = seeds.columns[0:7] >>> train, valid = seeds.split_frame(ratios=[.8], seed=1234) >>> seeds_km = H2OKMeansEstimator(k=3, seed=1234) >>> seeds_km.train(x=predictors, ... training_frame=train, ... validation_frame=valid) >>> seeds_km.scoring_history() 
 - 
property keep_cross_validation_fold_assignment¶
- Whether to keep the cross-validation fold assignment. - Type: - bool, defaults to- False.- Examples
 - >>> ozone = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/ozone.csv") >>> predictors = ["radiation","temperature","wind"] >>> train, valid = ozone.split_frame(ratios=[.8], seed=1234) >>> ozone_km = H2OKMeansEstimator(keep_cross_validation_fold_assignment=True, ... nfolds=5, ... seed=1234) >>> ozone_km.train(x=predictors, ... training_frame=train) >>> ozone_km.scoring_history() 
 - 
property keep_cross_validation_models¶
- Whether to keep the cross-validation models. - Type: - bool, defaults to- True.- Examples
 - >>> ozone = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/ozone.csv") >>> predictors = ["radiation","temperature","wind"] >>> train, valid = ozone.split_frame(ratios=[.8], seed=1234) >>> ozone_km = H2OKMeansEstimator(keep_cross_validation_models=True, ... nfolds=5, ... seed=1234) >>> ozone_km.train(x=predictors, ... training_frame=train, ... validation_frame=valid) >>> ozone_km.scoring_history() 
 - 
property keep_cross_validation_predictions¶
- Whether to keep the predictions of the cross-validation models. - Type: - bool, defaults to- False.- Examples
 - >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv") >>> predictors = ["AGE", "RACE", "DPROS", "DCAPS", ... "PSA", "VOL", "GLEASON"] >>> train, valid = prostate.split_frame(ratios=[.8], seed=1234) >>> pros_km = H2OKMeansEstimator(keep_cross_validation_predictions=True, ... nfolds=5, ... seed=1234) >>> pros_km.train(x=predictors, ... training_frame=train, ... validation_frame=valid) >>> pros_km.scoring_history() 
 - 
property max_iterations¶
- Maximum training iterations (if estimate_k is enabled, then this is for each inner Lloyds iteration) - Type: - int, defaults to- 10.- Examples
 - >>> benign = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> predictors = ["AGMT","FNDX","HIGD","DEG","CHK", ... "AGP1","AGMN","LIV","AGLP"] >>> train, valid = benign.split_frame(ratios=[.8], seed=1234) >>> benign_km = H2OKMeansEstimator(max_iterations=50) >>> benign_km.train(x=predictors, ... training_frame=train, ... validation_frame=valid) >>> benign_km.scoring_history() 
 - 
property max_runtime_secs¶
- Maximum allowed runtime in seconds for model training. Use 0 to disable. - Type: - float, defaults to- 0.0.- Examples
 - >>> benign = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> predictors = ["AGMT","FNDX","HIGD","DEG","CHK", ... "AGP1","AGMN","LIV","AGLP"] >>> train, valid = benign.split_frame(ratios=[.8], seed=1234) >>> benign_km = H2OKMeansEstimator(max_runtime_secs=10, ... seed=1234) >>> benign_km.train(x=predictors, ... training_frame=train, ... validation_frame=valid) >>> benign_km.scoring_history() 
 - 
property nfolds¶
- Number of folds for K-fold cross-validation (0 to disable or >= 2). - Type: - int, defaults to- 0.- Examples
 - >>> benign = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> predictors = ["AGMT","FNDX","HIGD","DEG","CHK", ... "AGP1","AGMN","LIV","AGLP"] >>> train, valid = benign.split_frame(ratios=[.8], seed=1234) >>> benign_km = H2OKMeansEstimator(nfolds=5, seed=1234) >>> benign_km.train(x=predictors, ... training_frame=train, ... validation_frame=valid) >>> benign_km.scoring_history() 
 - 
property score_each_iteration¶
- Whether to score during each iteration of model training. - Type: - bool, defaults to- False.- Examples
 - >>> benign = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> predictors = ["AGMT","FNDX","HIGD","DEG","CHK", ... "AGP1","AGMN","LIV","AGLP"] >>> train, valid = benign.split_frame(ratios=[.8], seed=1234) >>> benign_km = H2OKMeansEstimator(score_each_iteration=True, ... seed=1234) >>> benign_km.train(x=predictors, ... training_frame=train, ... validation_frame=valid) >>> benign_km.scoring_history() 
 - 
property seed¶
- RNG Seed - Type: - int, defaults to- -1.- Examples
 - >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv") >>> predictors = ["AGE", "RACE", "DPROS", "DCAPS", "PSA", "VOL", "GLEASON"] >>> train, valid = prostate.split_frame(ratios=[.8], seed=1234) >>> pros_w_seed = H2OKMeansEstimator(seed=1234) >>> pros_w_seed.train(x=predictors, ... training_frame=train, ... validation_frame=valid) >>> pros_wo_seed = H2OKMeansEstimator() >>> pros_wo_seed.train(x=predictors, ... training_frame=train, ... validation_frame=valid) >>> pros_w_seed.scoring_history() >>> pros_wo_seed.scoring_history() 
 - 
property standardize¶
- Standardize columns before computing distances - Type: - bool, defaults to- True.- Examples
 - >>> boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") >>> predictors = boston.columns[:-1] >>> boston['chas'] = boston['chas'].asfactor() >>> train, valid = boston.split_frame(ratios=[.8]) >>> boston_km = H2OKMeansEstimator(standardize=True) >>> boston_km.train(x=predictors, ... training_frame=train, ... validation_frame=valid) >>> boston_km.scoring_history() 
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv") >>> predictors = ["AGE", "RACE", "DPROS", "DCAPS", ... "PSA", "VOL", "GLEASON"] >>> train, valid = prostate.split_frame(ratios=[.8], seed=1234) >>> pros_km = H2OKMeansEstimator(seed=1234) >>> pros_km.train(x=predictors, ... training_frame=train, ... validation_frame=valid) >>> pros_km.scoring_history() 
 - 
property user_points¶
- This option allows you to specify a dataframe, where each row represents an initial cluster center. The user- specified points must have the same number of columns as the training observations. The number of rows must equal the number of clusters - Type: - Union[None, str, H2OFrame].- Examples
 - >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") >>> iris['class'] = iris['class'].asfactor() >>> predictors = iris.columns[:-1] >>> train, valid = iris.split_frame(ratios=[.8], seed=1234) >>> point1 = [4.9,3.0,1.4,0.2] >>> point2 = [5.6,2.5,3.9,1.1] >>> point3 = [6.5,3.0,5.2,2.0] >>> points = h2o.H2OFrame([point1, point2, point3]) >>> iris_km = H2OKMeansEstimator(k=3, ... user_points=points, ... seed=1234) >>> iris_km.train(x=predictors, ... training_frame=iris, ... validation_frame=valid) >>> iris_kmeans.tot_withinss(valid=True) 
 - 
property validation_frame¶
- Id of the validation data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv") >>> predictors = ["AGE", "RACE", "DPROS", "DCAPS", ... "PSA", "VOL", "GLEASON"] >>> train, valid = prostate.split_frame(ratios=[.8], seed=1234) >>> pros_km = H2OKMeansEstimator(seed=1234) >>> pros_km.train(x=predictors, ... training_frame=train, ... validation_frame=valid) >>> pros_km.scoring_history() 
 
- 
property 
H2OPrincipalComponentAnalysisEstimator¶
- 
class h2o.estimators.pca.H2OPrincipalComponentAnalysisEstimator(model_id=None, training_frame=None, validation_frame=None, ignored_columns=None, ignore_const_cols=True, score_each_iteration=False, transform='none', pca_method='gram_s_v_d', pca_impl=None, k=1, max_iterations=1000, use_all_factor_levels=False, compute_metrics=True, impute_missing=False, seed=-1, max_runtime_secs=0.0, export_checkpoints_dir=None)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- Principal Components Analysis - 
property compute_metrics¶
- Whether to compute metrics on the training data - Type: - bool, defaults to- True.- Examples
 - >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip") >>> prostate['CAPSULE'] = prostate['CAPSULE'].asfactor() >>> prostate['RACE'] = prostate['RACE'].asfactor() >>> prostate['DCAPS'] = prostate['DCAPS'].asfactor() >>> prostate['DPROS'] = prostate['DPROS'].asfactor() >>> pros_pca = H2OPrincipalComponentAnalysisEstimator(compute_metrics=False) >>> pros_pca.train(x=prostate.names, training_frame=prostate) >>> pros_pca.show() 
 - 
property export_checkpoints_dir¶
- Automatically export generated models to this directory. - Type: - str.- Examples
 - >>> import tempfile >>> from os import listdir >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip") >>> prostate['CAPSULE'] = prostate['CAPSULE'].asfactor() >>> prostate['RACE'] = prostate['RACE'].asfactor() >>> prostate['DCAPS'] = prostate['DCAPS'].asfactor() >>> prostate['DPROS'] = prostate['DPROS'].asfactor() >>> checkpoints_dir = tempfile.mkdtemp() >>> pros_pca = H2OPrincipalComponentAnalysisEstimator(impute_missing=True, ... export_checkpoints_dir=checkpoints_dir) >>> pros_pca.train(x=prostate.names, training_frame=prostate) >>> len(listdir(checkpoints_dir)) 
 - 
property ignore_const_cols¶
- Ignore constant columns. - Type: - bool, defaults to- True.- Examples
 - >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip") >>> prostate['CAPSULE'] = prostate['CAPSULE'].asfactor() >>> prostate['RACE'] = prostate['RACE'].asfactor() >>> prostate['DCAPS'] = prostate['DCAPS'].asfactor() >>> prostate['DPROS'] = prostate['DPROS'].asfactor() >>> pros_pca = H2OPrincipalComponentAnalysisEstimator(ignore_const_cols=False) >>> pros_pca.train(x=prostate.names, training_frame=prostate) >>> pros_pca.show() 
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
property impute_missing¶
- Whether to impute missing entries with the column mean - Type: - bool, defaults to- False.- Examples
 - >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip") >>> prostate['CAPSULE'] = prostate['CAPSULE'].asfactor() >>> prostate['RACE'] = prostate['RACE'].asfactor() >>> prostate['DCAPS'] = prostate['DCAPS'].asfactor() >>> prostate['DPROS'] = prostate['DPROS'].asfactor() >>> pros_pca = H2OPrincipalComponentAnalysisEstimator(impute_missing=True) >>> pros_pca.train(x=prostate.names, training_frame=prostate) >>> pros_pca.show() 
 - 
init_for_pipeline()[source]¶
- Returns H2OPCA object which implements fit and transform method to be used in sklearn.Pipeline properly. All parameters defined in self.__params, should be input parameters in H2OPCA.__init__ method. - Returns
- H2OPCA object 
- Examples
 - >>> from sklearn.pipeline import Pipeline >>> from h2o.transforms.preprocessing import H2OScaler >>> from h2o.estimators import H2ORandomForestEstimator >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") >>> pipe = Pipeline([("standardize", H2OScaler()), ... ("pca", H2OPrincipalComponentAnalysisEstimator(k=2).init_for_pipeline()), ... ("rf", H2ORandomForestEstimator(seed=42,ntrees=5))]) >>> pipe.fit(iris[:4], iris[4]) 
 - 
property k¶
- Rank of matrix approximation - Type: - int, defaults to- 1.- Examples
 - >>> data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/SDSS_quasar.txt.zip") >>> data_pca = H2OPrincipalComponentAnalysisEstimator(k=-1, ... transform="standardize", ... pca_method="power", ... impute_missing=True, ... max_iterations=800) >>> data_pca.train(x=data.names, training_frame=data) >>> data_pca.show() 
 - 
property max_iterations¶
- Maximum training iterations - Type: - int, defaults to- 1000.- Examples
 - >>> data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/SDSS_quasar.txt.zip") >>> data_pca = H2OPrincipalComponentAnalysisEstimator(k=-1, ... transform="standardize", ... pca_method="power", ... impute_missing=True, ... max_iterations=800) >>> data_pca.train(x=data.names, training_frame=data) >>> data_pca.show() 
 - 
property max_runtime_secs¶
- Maximum allowed runtime in seconds for model training. Use 0 to disable. - Type: - float, defaults to- 0.0.- Examples
 - >>> data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/SDSS_quasar.txt.zip") >>> data_pca = H2OPrincipalComponentAnalysisEstimator(k=-1, ... transform="standardize", ... pca_method="power", ... impute_missing=True, ... max_iterations=800 ... max_runtime_secs=15) >>> data_pca.train(x=data.names, training_frame=data) >>> data_pca.show() 
 - 
property pca_impl¶
- Specify the implementation to use for computing PCA (via SVD or EVD): MTJ_EVD_DENSEMATRIX - eigenvalue decompositions for dense matrix using MTJ; MTJ_EVD_SYMMMATRIX - eigenvalue decompositions for symmetric matrix using MTJ; MTJ_SVD_DENSEMATRIX - singular-value decompositions for dense matrix using MTJ; JAMA - eigenvalue decompositions for dense matrix using JAMA. References: JAMA - http://math.nist.gov/javanumerics/jama/; MTJ - https://github.com/fommil/matrix-toolkits-java/ - Type: - Literal["mtj_evd_densematrix", "mtj_evd_symmmatrix", "mtj_svd_densematrix", "jama"].- Examples
 - >>> data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/SDSS_quasar.txt.zip") >>> data_pca = H2OPrincipalComponentAnalysisEstimator(k=3, ... pca_impl="jama", ... impute_missing=True, ... max_iterations=1200) >>> data_pca.train(x=data.names, training_frame=data) >>> data_pca.show() 
 - 
property pca_method¶
- Specify the algorithm to use for computing the principal components: GramSVD - uses a distributed computation of the Gram matrix, followed by a local SVD; Power - computes the SVD using the power iteration method (experimental); Randomized - uses randomized subspace iteration method; GLRM - fits a generalized low-rank model with L2 loss function and no regularization and solves for the SVD using local matrix algebra (experimental) - Type: - Literal["gram_s_v_d", "power", "randomized", "glrm"], defaults to- "gram_s_v_d".- Examples
 - >>> data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/SDSS_quasar.txt.zip") >>> data_pca = H2OPrincipalComponentAnalysisEstimator(k=-1, ... transform="standardize", ... pca_method="power", ... impute_missing=True, ... max_iterations=800) >>> data_pca.train(x=data.names, training_frame=data) >>> data_pca.show() 
 - 
property score_each_iteration¶
- Whether to score during each iteration of model training. - Type: - bool, defaults to- False.- Examples
 - >>> data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/SDSS_quasar.txt.zip") >>> data_pca = H2OPrincipalComponentAnalysisEstimator(k=3, ... score_each_iteration=True, ... seed=1234, ... impute_missing=True) >>> data_pca.train(x=data.names, training_frame=data) >>> data_pca.show() 
 - 
property seed¶
- RNG seed for initialization - Type: - int, defaults to- -1.- Examples
 - >>> data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/SDSS_quasar.txt.zip") >>> data_pca = H2OPrincipalComponentAnalysisEstimator(k=3, ... seed=1234, ... impute_missing=True) >>> data_pca.train(x=data.names, training_frame=data) >>> data_pca.show() 
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/SDSS_quasar.txt.zip") >>> data_pca = H2OPrincipalComponentAnalysisEstimator() >>> data_pca.train(x=data.names, training_frame=data) >>> data_pca.show() 
 - 
property transform¶
- Transformation of training data - Type: - Literal["none", "standardize", "normalize", "demean", "descale"], defaults to- "none".- Examples
 - >>> data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/SDSS_quasar.txt.zip") >>> data_pca = H2OPrincipalComponentAnalysisEstimator(k=-1, ... transform="standardize", ... pca_method="power", ... impute_missing=True, ... max_iterations=800) >>> data_pca.train(x=data.names, training_frame=data) >>> data_pca.show() 
 - 
property use_all_factor_levels¶
- Whether first factor level is included in each categorical expansion - Type: - bool, defaults to- False.- Examples
 - >>> data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/SDSS_quasar.txt.zip") >>> data_pca = H2OPrincipalComponentAnalysisEstimator(k=3, ... use_all_factor_levels=True, ... seed=1234) >>> data_pca.train(x=data.names, training_frame=data) >>> data_pca.show() 
 - 
property validation_frame¶
- Id of the validation data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/SDSS_quasar.txt.zip") >>> train, valid = data.split_frame(ratios=[.8], seed=1234) >>> model_pca = H2OPrincipalComponentAnalysisEstimator(impute_missing=True) >>> model_pca.train(x=data.names, ... training_frame=train, ... validation_frame=valid) >>> model_pca.show() 
 
- 
property 
Miscellaneous¶
automl¶
H2OAutoML¶
- 
class h2o.automl.H2OAutoML(nfolds=-1, balance_classes=False, class_sampling_factors=None, max_after_balance_size=5.0, max_runtime_secs=None, max_runtime_secs_per_model=None, max_models=None, distribution='AUTO', stopping_metric='AUTO', stopping_tolerance=None, stopping_rounds=3, seed=None, project_name=None, exclude_algos=None, include_algos=None, exploitation_ratio=-1, modeling_plan=None, preprocessing=None, monotone_constraints=None, keep_cross_validation_predictions=False, keep_cross_validation_models=False, keep_cross_validation_fold_assignment=False, sort_metric='AUTO', custom_metric_func=None, export_checkpoints_dir=None, verbosity='warn', **kwargs)[source]¶
- Bases: - h2o.automl._base.H2OAutoMLBaseMixin,- h2o.base.Keyed- Automatic Machine Learning - The Automatic Machine Learning (AutoML) function automates the supervised machine learning model training process. It trains several models, cross-validated by default, by using the following available algorithms: - XGBoost 
- GBM (Gradient Boosting Machine) 
- GLM (Generalized Linear Model) 
- DRF (Distributed Random Forest) 
- XRT (eXtremely Randomized Trees) 
- DeepLearning (Fully Connected Deep Neural Network) 
 - It also applies HPO on the following algorithms: - XGBoost 
- GBM 
- DeepLearning 
 - In some cases, there will not be enough time to complete all the algorithms, so some may be missing from the leaderboard. Finally, AutoML also trains several Stacked Ensemble models at various stages during the run. Mainly two kinds of Stacked Ensemble models are trained: - one of all available models at time t 
- one of only the best models of each kind at time t. 
 - Note that Stacked Ensemble models are trained only if there isn’t another stacked ensemble with the same base models. - Examples
 - >>> import h2o >>> from h2o.automl import H2OAutoML >>> h2o.init() >>> # Import a sample binary outcome train/test set into H2O >>> train = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv") >>> test = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_test_5k.csv") >>> # Identify the response and set of predictors >>> y = "response" >>> x = list(train.columns) #if x is defined as all columns except the response, then x is not required >>> x.remove(y) >>> # For binary classification, response should be a factor >>> train[y] = train[y].asfactor() >>> test[y] = test[y].asfactor() >>> # Run AutoML for 30 seconds >>> aml = H2OAutoML(max_runtime_secs = 30) >>> aml.train(x = x, y = y, training_frame = train) >>> # Print Leaderboard (ranked by xval metrics) >>> aml.leaderboard >>> # (Optional) Evaluate performance on a test set >>> perf = aml.leader.model_performance(test) >>> perf.auc() - 
property balance_classes¶
- Specify whether to oversample the minority classes to balance the class distribution. This option can increase
- the data frame size. This option is only applicable for classification. If the oversampled size of the dataset exceeds the maximum size calculated using the - max_after_balance_sizeparameter, then the majority classes will be undersampled to satisfy the size limit. Defaults to- False.
 - Type: bool 
 - 
property class_sampling_factors¶
- Desired over/under-sampling ratios per class (in lexicographic order). If not specified, sampling factors will be automatically computed to obtain class balance during training. Requires - balance_classesset to- True.
 - 
property distribution¶
- Distribution function used by algorithms that support it; other algorithms
- use their defaults. Possible values: “AUTO”, “bernoulli”, “multinomial”, “gaussian”, “poisson”, “gamma”, “tweedie”, “laplace”, “quantile”, “huber”, “custom”, and for parameterized distributions dictionary form is used to specify the parameter, e.g., - dict(type="tweedie", tweedie_power=1.5). Defaults to- AUTO.
 - Type: Union[str, dict] 
 - 
property event_log¶
- Retrieve the backend event log from an H2OAutoML object - Returns
- an H2OFrame with detailed events occurred during the AutoML training. 
 
 - 
property exclude_algos¶
- List the algorithms to skip during the model-building phase. The full list of options is: - "DRF"(Random Forest and Extremely-Randomized Trees)
- "GLM"
- "XGBoost"
- "GBM"
- "DeepLearning"
- "StackedEnsemble"
 - Defaults to - None, which means that all appropriate H2O algorithms will be used, if the search stopping criteria allow. Optional. Usage example:- exclude_algos = ["GLM", "DeepLearning", "DRF"] 
 - 
property exploitation_ratio¶
- The budget ratio (between 0 and 1) dedicated to the exploitation (vs exploration) phase. By default, the exploitation phase is - 0(disabled) as this is still experimental; to activate it, it is recommended to try a ratio around 0.1. Note that the current exploitation phase only tries to fine-tune the best XGBoost and the best GBM found during exploration.
 - 
property export_checkpoints_dir¶
- Path to a directory where every model will be stored in binary form. 
 - 
property include_algos¶
- List the algorithms to restrict to during the model-building phase. This can’t be used in combination with - exclude_algosparam. Defaults to- None, which means that all appropriate H2O algorithms will be used, if the search stopping criteria allow. Optional. Usage example:- include_algos = ["GLM", "DeepLearning", "DRF"] 
 - 
property keep_cross_validation_fold_assignment¶
- Whether to keep fold assignments in the models. Deleting them will save memory in the H2O cluster. Defaults to - False.
 - 
property keep_cross_validation_models¶
- Whether to keep the cross-validated models. Keeping cross-validation models may consume significantly more memory in the H2O cluster. Defaults to - False.
 - 
property keep_cross_validation_predictions¶
- Whether to keep the predictions of the cross-validation predictions. This needs to be set to - Trueif running the same AutoML object for repeated runs because CV predictions are required to build additional Stacked Ensemble models in AutoML. Defaults to- False.
 - 
property key¶
- Returns
- the unique key representing the object on the backend 
 
 - 
property leader¶
- Retrieve the top model from an H2OAutoML object - Returns
- an H2O model 
- Examples
 - >>> # Set up an H2OAutoML object >>> aml = H2OAutoML(max_runtime_secs=30) >>> # Launch an AutoML run >>> aml.train(y=y, training_frame=train) >>> # Get the best model in the AutoML Leaderboard >>> aml.leader >>> >>> # Get AutoML object by `project_name` >>> get_aml = h2o.automl.get_automl(aml.project_name) >>> # Get the best model in the AutoML Leaderboard >>> get_aml.leader 
 - 
property leaderboard¶
- Retrieve the leaderboard from an H2OAutoML object - Returns
- an H2OFrame with model ids in the first column and evaluation metric in the second column sorted by the evaluation metric 
- Examples
 - >>> # Set up an H2OAutoML object >>> aml = H2OAutoML(max_runtime_secs=30) >>> # Launch an AutoML run >>> aml.train(y=y, training_frame=train) >>> # Get the AutoML Leaderboard >>> aml.leaderboard >>> >>> # Get AutoML object by `project_name` >>> get_aml = h2o.automl.get_automl(aml.project_name) >>> # Get the AutoML Leaderboard >>> get_aml.leaderboard 
 - 
property max_after_balance_size¶
- Maximum relative size of the training data after balancing class counts (can be less than 1.0).
- Requires - balance_classes. Defaults to- 5.0.
 - Type: float 
 - 
property max_models¶
- Specify the maximum number of models to build in an AutoML run, excluding the Stacked Ensemble models.
- Defaults to - None(disabled: no limitation). Always set this parameter to ensure AutoML reproducibility: all models are then trained until convergence and none is constrained by a time budget.
 - Type: int 
 - 
property max_runtime_secs¶
- Specify the maximum time that the AutoML process will run for.
- If both - max_runtime_secsand- max_modelsare specified, then the AutoML run will stop as soon as it hits either of these limits. If neither- max_runtime_secsnor- max_modelsare specified, then- max_runtime_secsdynamically defaults to 3600 seconds (1 hour). Otherwise, defaults to- 0(no limit).
 - Type: int 
 - 
property max_runtime_secs_per_model¶
- Controls the max time the AutoML run will dedicate to each individual model.
- Defaults to - 0(disabled: no time limit). Note that models constrained by a time budget are not guaranteed reproducible.
 - Type: int 
 - 
property modeling_plan¶
- List of modeling steps to be used by the AutoML engine (they may not all get executed, depending on other constraints). Defaults to - None(Expert usage only).
 - 
property modeling_steps¶
- Expose the modeling steps effectively used by the AutoML run. This executed plan can be directly reinjected as the modeling_plan property of a new AutoML instance to improve reproducibility across AutoML versions. - Returns
- a list of dictionaries representing the effective modeling plan. 
 
 - 
property monotone_constraints¶
- A mapping that represents monotonic constraints. Use - +1to enforce an increasing constraint and- -1to specify a decreasing constraint.
 - 
property nfolds¶
- Specify a value >= 2 for the number of folds for k-fold cross-validation for the models in the AutoML or specify -1(default)
- to let AutoML choose what it will do. If the data is big enough (depending on the cluster resources), it will create a blending frame and will not do cross-validation. Otherwise, it will use 5 fold cross-validation. 
 - Type: int 
- Specify a value >= 2 for the number of folds for k-fold cross-validation for the models in the AutoML or specify 
 - 
predict(test_data)[source]¶
- Predict on a dataset. - Parameters
- test_data (H2OFrame) – Data on which to make predictions. 
- Returns
- A new H2OFrame of predictions. 
- Examples
 - >>> # Set up an H2OAutoML object >>> aml = H2OAutoML(max_runtime_secs=30) >>> # Launch an H2OAutoML run >>> aml.train(y=y, training_frame=train) >>> # Predict with top model from AutoML Leaderboard on a H2OFrame called 'test' >>> aml.predict(test) >>> >>> # Get AutoML object by `project_name` >>> get_aml = h2o.automl.get_automl(aml.project_name) >>> # Predict with top model from AutoML Leaderboard on a H2OFrame called 'test' >>> get_aml.predict(test) 
 - 
property preprocessing¶
- List of preprocessing steps to run. Only - ["target_encoding"]is currently supported. Experimental.
 - 
property project_name¶
- Character string to identify an AutoML project.
- Defaults to - None, which means a project name will be auto-generated based on the training frame ID. More models can be trained on an existing AutoML project by specifying the same project name in multiple calls to the AutoML function (as long as the same training frame, or a sample, is used in subsequent runs).
 - Type: str 
 - 
property seed¶
- Set a seed for reproducibility.
- AutoML can only guarantee reproducibility if - max_modelsor early stopping is used because- max_runtime_secsis resource limited, meaning that if the resources are not the same between runs, AutoML may be able to train more models on one run vs another. In addition, H2O Deep Learning models are not reproducible by default for performance reasons, so- exclude_algosmust contain- DeepLearning. Defaults to- None.
 - Type: int 
 - 
property sort_metric¶
- Metric to sort the leaderboard by at the end of an AutoML run. For binomial classification, select from the following options: - "auc"
- "aucpr"
- "logloss"
- "mean_per_class_error"
- "rmse"
- "mse"
 - For multinomial classification, select from the following options: - "mean_per_class_error"
- "logloss"
- "rmse"
- "mse"
 - For regression, select from the following options: - "deviance"
- "rmse"
- "mse"
- "mae"
- "rmlse"
 - Defaults to - "AUTO"(This translates to- "auc"for binomial classification,- "mean_per_class_error"for multinomial classification,- "deviance"for regression).
 - 
property stopping_metric¶
- Specifies the metric to use for early stopping.
- The available options are: - "AUTO"(This defaults to- "logloss"for classification,- "deviance"for regression)
- "deviance"
- "logloss"
- "mse"
- "rmse"
- "mae"
- "rmsle"
- "auc"
- aucpr
- "lift_top_group"
- "misclassification"
- "mean_per_class_error"
- "r2"
 - Defaults to - "AUTO".
 - Type: str 
 - 
property stopping_rounds¶
- Stop training new models in the AutoML run when the option selected for
- stopping_metricdoesn’t improve for the specified number of models, based on a simple moving average. To disable this feature, set it to- 0. Defaults to- 3and must be an non-negative integer.
 - Type: int 
 - 
property stopping_tolerance¶
- Specify the relative tolerance for the metric-based stopping criterion to stop a grid search and
- the training of individual models within the AutoML run. Defaults to - 0.001if the dataset is at least 1 million rows; otherwise it defaults to a value determined by the size of the dataset and the non-NA-rate, in which case the value is computed as 1/sqrt(nrows * non-NA-rate).
 - Type: float 
 - 
train(x=None, y=None, training_frame=None, fold_column=None, weights_column=None, validation_frame=None, leaderboard_frame=None, blending_frame=None)[source]¶
- Begins an AutoML task, a background task that automatically builds a number of models with various algorithms and tracks their performance in a leaderboard. At any point in the process you may use H2O’s performance or prediction functions on the resulting models. - Parameters
- x – A list of column names or indices indicating the predictor columns. 
- y – An index or a column name indicating the response column. 
- fold_column – The name or index of the column in training_frame that holds per-row fold assignments. 
- weights_column – The name or index of the column in training_frame that holds per-row weights. 
- training_frame – The H2OFrame having the columns indicated by x and y (as well as any additional columns specified by fold_column or weights_column). 
- validation_frame – H2OFrame with validation data. This argument is ignored unless the user sets nfolds = 0. If cross-validation is turned off, then a validation frame can be specified and used for early stopping of individual models and early stopping of the grid searches. By default and when nfolds > 1, cross-validation metrics will be used for early stopping and thus validation_frame will be ignored. 
- leaderboard_frame – H2OFrame with test data for scoring the leaderboard. This is optional and if this is set to None (the default), then cross-validation metrics will be used to generate the leaderboard rankings instead. 
- blending_frame – H2OFrame used to train the the metalearning algorithm in Stacked Ensembles (instead of relying on cross-validated predicted values). This is optional, but when provided, it is also recommended to disable cross validation by setting nfolds=0 and to provide a leaderboard frame for scoring purposes. 
 
- Returns
- An H2OAutoML object. 
- Examples
 - >>> # Set up an H2OAutoML object >>> aml = H2OAutoML(max_runtime_secs=30) >>> # Launch an AutoML run >>> aml.train(y=y, training_frame=train) 
 - 
property training_info¶
- Expose the name/value columns of event_log as a simple dictionary, for example start_epoch, stop_epoch, … See - event_log()to obtain a description of those key/value pairs.- Returns
- a dictionary with event_log[‘name’] column as keys and event_log[‘value’] column as values. 
 
 
H2OEstimator¶
- 
class h2o.estimators.estimator_base.H2OEstimator[source]¶
- Bases: - h2o.model.model_base.ModelBase- Base class for H2O Estimators. - H2O Estimators implement the following methods for model construction: - start()- Top-level user-facing API for asynchronous model build
- join()- Top-level user-facing API for blocking on async model build
- train()- Top-level user-facing API for model building.
- fit()- Used by scikit-learn.
 - Because H2OEstimator instances are instances of ModelBase, these objects can use the H2O model API. - 
fit(X, y=None, **params)[source]¶
- Fit an H2O model as part of a scikit-learn pipeline or grid search. - A warning will be issued if a caller other than sklearn attempts to use this method. 
 - 
get_params(deep=True)[source]¶
- Obtain parameters for this estimator. - Used primarily for sklearn Pipelines and sklearn grid search. - Parameters
- deep – If True, return parameters of all sub-objects that are estimators. 
- Returns
- A dict of parameters 
 
 - 
set_params(**parms)[source]¶
- Used by sklearn for updating parameters during grid search. - Parameters
- parms – A dictionary of parameters that will be set on this model. 
- Returns
- self, the current estimator object with the parameters all set as desired. 
 
 - 
start(x, y=None, training_frame=None, offset_column=None, fold_column=None, weights_column=None, validation_frame=None, **params)[source]¶
- Train the model asynchronously (to block for results call - join()).- Parameters
- x – A list of column names or indices indicating the predictor columns. 
- y – An index or a column name indicating the response column. 
- training_frame (H2OFrame) – The H2OFrame having the columns indicated by x and y (as well as any additional columns specified by fold, offset, and weights). 
- offset_column – The name or index of the column in training_frame that holds the offsets. 
- fold_column – The name or index of the column in training_frame that holds the per-row fold assignments. 
- weights_column – The name or index of the column in training_frame that holds the per-row weights. 
- validation_frame – H2OFrame with validation data to be scored on while training. 
 
 
 - 
train(x=None, y=None, training_frame=None, offset_column=None, fold_column=None, weights_column=None, validation_frame=None, max_runtime_secs=None, ignored_columns=None, model_id=None, verbose=False)[source]¶
- Train the H2O model. - Parameters
- x – A list of column names or indices indicating the predictor columns. 
- y – An index or a column name indicating the response column. 
- training_frame (H2OFrame) – The H2OFrame having the columns indicated by x and y (as well as any additional columns specified by fold, offset, and weights). 
- offset_column – The name or index of the column in training_frame that holds the offsets. 
- fold_column – The name or index of the column in training_frame that holds the per-row fold assignments. 
- weights_column – The name or index of the column in training_frame that holds the per-row weights. 
- validation_frame – H2OFrame with validation data to be scored on while training. 
- max_runtime_secs (float) – Maximum allowed runtime in seconds for model training. Use 0 to disable. 
- verbose (bool) – Print scoring history to stdout. Defaults to False. 
 
 
 - 
train_segments(x=None, y=None, training_frame=None, offset_column=None, fold_column=None, weights_column=None, validation_frame=None, max_runtime_secs=None, ignored_columns=None, segments=None, segment_models_id=None, parallelism=1, verbose=False)[source]¶
- Trains H2O model for each segment (subpopulation) of the training dataset. - Parameters
- x – A list of column names or indices indicating the predictor columns. 
- y – An index or a column name indicating the response column. 
- training_frame (H2OFrame) – The H2OFrame having the columns indicated by x and y (as well as any additional columns specified by fold, offset, and weights). 
- offset_column – The name or index of the column in training_frame that holds the offsets. 
- fold_column – The name or index of the column in training_frame that holds the per-row fold assignments. 
- weights_column – The name or index of the column in training_frame that holds the per-row weights. 
- validation_frame – H2OFrame with validation data to be scored on while training. 
- max_runtime_secs (float) – Maximum allowed runtime in seconds for each model training. Use 0 to disable. Please note that regardless of how this parameter is set, a model will be built for each input segment. This parameter only affects individual model training. 
- segments – A list of columns to segment-by. H2O will group the training (and validation) dataset by the segment-by columns and train a separate model for each segment (group of rows). As an alternative to providing a list of columns, users can also supply an explicit enumeration of segments to build the models for. This enumeration needs to be represented as H2OFrame. 
- segment_models_id – Identifier for the returned collection of Segment Models. If not specified it will be automatically generated. 
- parallelism – Level of parallelism of the bulk segment models building, it is the maximum number of models each H2O node will be building in parallel. 
- verbose (bool) – Enable to print additional information during model building. Defaults to False. 
 
- Examples
 - >>> response = "survived" >>> titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv") >>> titanic[response] = titanic[response].asfactor() >>> predictors = ["survived","name","sex","age","sibsp","parch","ticket","fare","cabin"] >>> train, valid = titanic.split_frame(ratios=[.8], seed=1234) >>> from h2o.estimators.gbm import H2OGradientBoostingEstimator >>> titanic_gbm = H2OGradientBoostingEstimator(seed=1234) >>> titanic_models = titanic_gbm.train_segments(segments=["pclass"], ... x=predictors, ... y=response, ... training_frame=train, ... validation_frame=valid) >>> titanic_models.as_frame() 
 
H2OSingularValueDecompositionEstimator¶
- 
class h2o.estimators.svd.H2OSingularValueDecompositionEstimator(model_id=None, training_frame=None, validation_frame=None, ignored_columns=None, ignore_const_cols=True, score_each_iteration=False, transform='none', svd_method='gram_s_v_d', nv=1, max_iterations=1000, seed=-1, keep_u=True, u_name=None, use_all_factor_levels=True, max_runtime_secs=0.0, export_checkpoints_dir=None)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- Singular Value Decomposition - 
property export_checkpoints_dir¶
- Automatically export generated models to this directory. - Type: - str.- Examples
 - >>> import tempfile >>> from os import listdir >>> arrests = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/USArrests.csv") >>> checkpoints_dir = tempfile.mkdtemp() >>> fit_h2o = H2OSingularValueDecompositionEstimator(export_checkpoints_dir=checkpoints_dir, ... seed=-5) >>> fit_h2o.train(x=list(range(4)), training_frame=arrests) >>> len(listdir(checkpoints_dir)) 
 - 
property ignore_const_cols¶
- Ignore constant columns. - Type: - bool, defaults to- True.- Examples
 - >>> arrests = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/USArrests.csv") >>> fit_h2o = H2OSingularValueDecompositionEstimator(ignore_const_cols=False, ... nv=4) >>> fit_h2o.train(x=list(range(4)), training_frame=arrests) >>> fit_h2o 
 - 
property ignored_columns¶
- Names of columns to ignore for training. - Type: - List[str].
 - 
init_for_pipeline()[source]¶
- Returns H2OSVD object which implements fit and transform method to be used in sklearn.Pipeline properly. All parameters defined in self.__params, should be input parameters in H2OSVD.__init__ method. - Returns
- H2OSVD object 
- Examples
 - >>> from h2o.transforms.preprocessing import H2OScaler >>> from h2o.estimators import H2ORandomForestEstimator >>> from h2o.estimators import H2OSingularValueDecompositionEstimator >>> from sklearn.pipeline import Pipeline >>> arrests = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/USArrests.csv") >>> pipe = Pipeline([("standardize", H2OScaler()), ... ("svd", H2OSingularValueDecompositionEstimator(nv=3).init_for_pipeline()), ... ("rf", H2ORandomForestEstimator(seed=42,ntrees=50))]) >>> pipe.fit(arrests[1:], arrests[0]) 
 - 
property keep_u¶
- Save left singular vectors? - Type: - bool, defaults to- True.- Examples
 - >>> arrests = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/USArrests.csv") >>> fit_h2o = H2OSingularValueDecompositionEstimator(keep_u=False) >>> fit_h2o.train(x=list(range(4)), training_frame=arrests) >>> fit_h2o 
 - 
property max_iterations¶
- Maximum iterations - Type: - int, defaults to- 1000.- Examples
 - >>> arrests = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/USArrests.csv") >>> fit_h2o = H2OSingularValueDecompositionEstimator(nv=4, ... transform="standardize", ... max_iterations=2000) >>> fit_h2o.train(x=list(range(4)), training_frame=arrests) >>> fit_h2o 
 - 
property max_runtime_secs¶
- Maximum allowed runtime in seconds for model training. Use 0 to disable. - Type: - float, defaults to- 0.0.- Examples
 - >>> arrests = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/USArrests.csv") >>> fit_h2o = H2OSingularValueDecompositionEstimator(nv=4, ... transform="standardize", ... max_runtime_secs=25) >>> fit_h2o.train(x=list(range(4)), training_frame=arrests) >>> fit_h2o 
 - 
property nv¶
- Number of right singular vectors - Type: - int, defaults to- 1.- Examples
 - >>> arrests = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/USArrests.csv") >>> fit_h2o = H2OSingularValueDecompositionEstimator(nv=4, ... transform="standardize", ... max_iterations=2000) >>> fit_h2o.train(x=list(range(4)), training_frame=arrests) >>> fit_h2o 
 - 
property score_each_iteration¶
- Whether to score during each iteration of model training. - Type: - bool, defaults to- False.- Examples
 - >>> arrests = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/USArrests.csv") >>> fit_h2o = H2OSingularValueDecompositionEstimator(nv=4, ... score_each_iteration=True) >>> fit_h2o.train(x=list(range(4)), training_frame=arrests) >>> fit_h2o 
 - 
property seed¶
- RNG seed for k-means++ initialization - Type: - int, defaults to- -1.- Examples
 - >>> arrests = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/USArrests.csv") >>> fit_h2o = H2OSingularValueDecompositionEstimator(nv=4, seed=-3) >>> fit_h2o.train(x=list(range(4)), training_frame=arrests) >>> fit_h2o 
 - 
property svd_method¶
- Method for computing SVD (Caution: Randomized is currently experimental and unstable) - Type: - Literal["gram_s_v_d", "power", "randomized"], defaults to- "gram_s_v_d".- Examples
 - >>> arrests = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/USArrests.csv") >>> fit_h2o = H2OSingularValueDecompositionEstimator(svd_method="power") >>> fit_h2o.train(x=list(range(4)), training_frame=arrests) >>> fit_h2o 
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> arrests = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/USArrests.csv") >>> fit_h2o = H2OSingularValueDecompositionEstimator() >>> fit_h2o.train(x=list(range(4)), training_frame=arrests) >>> fit_h2o 
 - 
property transform¶
- Transformation of training data - Type: - Literal["none", "standardize", "normalize", "demean", "descale"], defaults to- "none".- Examples
 - >>> arrests = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/USArrests.csv") >>> fit_h2o = H2OSingularValueDecompositionEstimator(nv=4, ... transform="standardize", ... max_iterations=2000) >>> fit_h2o.train(x=list(range(4)), training_frame=arrests) >>> fit_h2o 
 - 
property u_name¶
- Frame key to save left singular vectors - Type: - str.- Examples
 - >>> arrests = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/USArrests.csv") >>> fit_h2o = H2OSingularValueDecompositionEstimator(u_name="fit_h2o") >>> fit_h2o.train(x=list(range(4)), training_frame=arrests) >>> fit_h2o.u_name >>> fit_h2o 
 - 
property use_all_factor_levels¶
- Whether first factor level is included in each categorical expansion - Type: - bool, defaults to- True.- Examples
 - >>> arrests = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/USArrests.csv") >>> fit_h2o = H2OSingularValueDecompositionEstimator(use_all_factor_levels=False) >>> fit_h2o.train(x=list(range(4)), training_frame=arrests) >>> fit_h2o 
 - 
property validation_frame¶
- Id of the validation data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> arrests = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/USArrests.csv") >>> train, valid = arrests.split_frame(ratios=[.8]) >>> fit_h2o = H2OSingularValueDecompositionEstimator() >>> fit_h2o.train(x=list(range(4)), ... training_frame=train, ... validation_frame=valid) >>> fit_h2o 
 
- 
property 
H2OWord2vecEstimator¶
- 
class h2o.estimators.word2vec.H2OWord2vecEstimator(model_id=None, training_frame=None, min_word_freq=5, word_model='skip_gram', norm_model='hsm', vec_size=100, window_size=5, sent_sample_rate=0.001, init_learning_rate=0.025, epochs=5, pre_trained=None, max_runtime_secs=0.0, export_checkpoints_dir=None)[source]¶
- Bases: - h2o.estimators.estimator_base.H2OEstimator- Word2Vec - 
property epochs¶
- Number of training iterations to run - Type: - int, defaults to- 5.- Examples
 - >>> job_titles = h2o.import_file(("https://s3.amazonaws.com/h2o-public-test-data/smalldata/craigslistJobTitles.csv"), ... col_names = ["category", "jobtitle"], ... col_types = ["string", "string"], ... header = 1) >>> words = job_titles.tokenize(" ") >>> w2v_model = H2OWord2vecEstimator(sent_sample_rate = 0.0, epochs = 10) >>> w2v_model.train(training_frame=words) >>> synonyms = w2v_model.find_synonyms("teacher", count = 5) >>> print(synonyms) >>> >>> w2v_model2 = H2OWord2vecEstimator(sent_sample_rate = 0.0, epochs = 1) >>> w2v_model2.train(training_frame=words) >>> synonyms2 = w2v_model2.find_synonyms("teacher", 3) >>> print(synonyms2) 
 - 
property export_checkpoints_dir¶
- Automatically export generated models to this directory. - Type: - str.- Examples
 - >>> import tempfile >>> from os import listdir >>> job_titles = h2o.import_file(("https://s3.amazonaws.com/h2o-public-test-data/smalldata/craigslistJobTitles.csv"), ... col_names = ["category", "jobtitle"], ... col_types = ["string", "string"], ... header = 1) >>> checkpoints_dir = tempfile.mkdtemp() >>> words = job_titles.tokenize(" ") >>> w2v_model = H2OWord2vecEstimator(epochs=1, ... max_runtime_secs=10, ... export_checkpoints_dir=checkpoints_dir) >>> w2v_model.train(training_frame=words) >>> len(listdir(checkpoints_dir)) 
 - 
static from_external(external=<class 'h2o.frame.H2OFrame'>)[source]¶
- Creates new H2OWord2vecEstimator based on an external model. - Parameters
- external – H2OFrame with an external model 
- Returns
- H2OWord2vecEstimator instance representing the external model 
- Examples
 - >>> words = h2o.create_frame(rows=10, cols=1, ... string_fraction=1.0, ... missing_fraction=0.0) >>> embeddings = h2o.create_frame(rows=10, cols=100, ... real_fraction=1.0, ... missing_fraction=0.0) >>> word_embeddings = words.cbind(embeddings) >>> w2v_model = H2OWord2vecEstimator.from_external(external=word_embeddings) 
 - 
property init_learning_rate¶
- Set the starting learning rate - Type: - float, defaults to- 0.025.- Examples
 - >>> job_titles = h2o.import_file(("https://s3.amazonaws.com/h2o-public-test-data/smalldata/craigslistJobTitles.csv"), ... col_names = ["category", "jobtitle"], ... col_types = ["string", "string"], ... header = 1) >>> words = job_titles.tokenize(" ") >>> w2v_model = H2OWord2vecEstimator(epochs=3, init_learning_rate=0.05) >>> w2v_model.train(training_frame=words) >>> synonyms = w2v_model.find_synonyms("assistant", 3) >>> print(synonyms) 
 - 
property max_runtime_secs¶
- Maximum allowed runtime in seconds for model training. Use 0 to disable. - Type: - float, defaults to- 0.0.- Examples
 - >>> job_titles = h2o.import_file(("https://s3.amazonaws.com/h2o-public-test-data/smalldata/craigslistJobTitles.csv"), ... col_names = ["category", "jobtitle"], ... col_types = ["string", "string"], ... header = 1) >>> words = job_titles.tokenize(" ") >>> w2v_model = H2OWord2vecEstimator(epochs=1, max_runtime_secs=10) >>> w2v_model.train(training_frame=words) >>> synonyms = w2v_model.find_synonyms("tutor", 3) >>> print(synonyms) 
 - 
property min_word_freq¶
- This will discard words that appear less than <int> times - Type: - int, defaults to- 5.- Examples
 - >>> job_titles = h2o.import_file(("https://s3.amazonaws.com/h2o-public-test-data/smalldata/craigslistJobTitles.csv"), ... col_names = ["category", "jobtitle"], ... col_types = ["string", "string"], ... header = 1) >>> words = job_titles.tokenize(" ") >>> w2v_model = H2OWord2vecEstimator(epochs=1, min_word_freq=4) >>> w2v_model.train(training_frame=words) >>> synonyms = w2v_model.find_synonyms("teacher", 3) >>> print(synonyms) 
 - 
property norm_model¶
- Use Hierarchical Softmax - Type: - Literal["hsm"], defaults to- "hsm".- Examples
 - >>> job_titles = h2o.import_file(("https://s3.amazonaws.com/h2o-public-test-data/smalldata/craigslistJobTitles.csv"), ... col_names = ["category", "jobtitle"], ... col_types = ["string", "string"], ... header = 1) >>> words = job_titles.tokenize(" ") >>> w2v_model = H2OWord2vecEstimator(epochs=1, norm_model="hsm") >>> w2v_model.train(training_frame=words) >>> synonyms = w2v_model.find_synonyms("teacher", 3) >>> print(synonyms) 
 - 
property pre_trained¶
- Id of a data frame that contains a pre-trained (external) word2vec model - Type: - Union[None, str, H2OFrame].- Examples
 - >>> words = h2o.create_frame(rows=1000,cols=1, ... string_fraction=1.0, ... missing_fraction=0.0) >>> embeddings = h2o.create_frame(rows=1000,cols=100, ... real_fraction=1.0, ... missing_fraction=0.0) >>> word_embeddings = words.cbind(embeddings) >>> w2v_model = H2OWord2vecEstimator(pre_trained=word_embeddings) >>> w2v_model.train(training_frame=word_embeddings) >>> model_id = w2v_model.model_id >>> model = h2o.get_model(model_id) 
 - 
property sent_sample_rate¶
- Set threshold for occurrence of words. Those that appear with higher frequency in the training data
- will be randomly down-sampled; useful range is (0, 1e-5) 
 - Type: - float, defaults to- 0.001.- Examples
 - >>> job_titles = h2o.import_file(("https://s3.amazonaws.com/h2o-public-test-data/smalldata/craigslistJobTitles.csv"), ... col_names = ["category", "jobtitle"], ... col_types = ["string", "string"], ... header = 1) >>> words = job_titles.tokenize(" ") >>> w2v_model = H2OWord2vecEstimator(epochs=1, sent_sample_rate=0.01) >>> w2v_model.train(training_frame=words) >>> synonyms = w2v_model.find_synonyms("teacher", 3) >>> print(synonyms) 
 - 
property training_frame¶
- Id of the training data frame. - Type: - Union[None, str, H2OFrame].- Examples
 - >>> job_titles = h2o.import_file(("https://s3.amazonaws.com/h2o-public-test-data/smalldata/craigslistJobTitles.csv"), ... col_names = ["category", "jobtitle"], ... col_types = ["string", "string"], ... header = 1) >>> words = job_titles.tokenize(" ") >>> w2v_model = H2OWord2vecEstimator() >>> w2v_model.train(training_frame=words) >>> synonyms = w2v_model.find_synonyms("tutor", 3) >>> print(synonyms) 
 - 
property vec_size¶
- Set size of word vectors - Type: - int, defaults to- 100.- Examples
 - >>> job_titles = h2o.import_file(("https://s3.amazonaws.com/h2o-public-test-data/smalldata/craigslistJobTitles.csv"), ... col_names = ["category", "jobtitle"], ... col_types = ["string", "string"], ... header = 1) >>> words = job_titles.tokenize(" ") >>> w2v_model = H2OWord2vecEstimator(epochs=3, vec_size=50) >>> w2v_model.train(training_frame=words) >>> synonyms = w2v_model.find_synonyms("tutor", 3) >>> print(synonyms) 
 - 
property window_size¶
- Set max skip length between words - Type: - int, defaults to- 5.- Examples
 - >>> job_titles = h2o.import_file(("https://s3.amazonaws.com/h2o-public-test-data/smalldata/craigslistJobTitles.csv"), ... col_names = ["category", "jobtitle"], ... col_types = ["string", "string"], ... header = 1) >>> words = job_titles.tokenize(" ") >>> w2v_model = H2OWord2vecEstimator(epochs=3, window_size=2) >>> w2v_model.train(training_frame=words) >>> synonyms = w2v_model.find_synonyms("teacher", 3) >>> print(synonyms) 
 - 
property word_model¶
- The word model to use (SkipGram or CBOW) - Type: - Literal["skip_gram", "cbow"], defaults to- "skip_gram".- Examples
 - >>> job_titles = h2o.import_file(("https://s3.amazonaws.com/h2o-public-test-data/smalldata/craigslistJobTitles.csv"), ... col_names = ["category", "jobtitle"], ... col_types = ["string", "string"], ... header = 1) >>> words = job_titles.tokenize(" ") >>> w2v_model = H2OWord2vecEstimator(epochs=3, word_model="skip_gram") >>> w2v_model.train(training_frame=words) >>> synonyms = w2v_model.find_synonyms("assistant", 3) >>> print(synonyms) 
 
- 
property 
H2OGridSearch¶
- 
class h2o.grid.H2OGridSearch(model, hyper_params, grid_id=None, search_criteria=None, export_checkpoints_dir=None, recovery_dir=None, parallelism=1)[source]¶
- Bases: - h2o.grid.grid_search.H2OGridSearch- Grid Search of a Hyper-Parameter Space for a Model - Examples - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> hyper_parameters = {'alpha': [0.01,0.5], 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> training_data = h2o.import_file("smalldata/logreg/benign.csv") >>> gs.train(x=[3, 4-11], y=3, training_frame=training_data) >>> gs.show() - 
aic(train=False, valid=False, xval=False)[source]¶
- Get the AIC(s). - If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”. - Parameters
- train (bool) – If train is True, then return the AIC value for the training data. 
- valid (bool) – If valid is True, then return the AIC value for the validation data. 
- xval (bool) – If xval is True, then return the AIC value for the validation data. 
 
- Returns
- The AIC. 
- Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators import H2OGeneralizedLinearEstimator >>> prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip") >>> prostate[2] = prostate[2].asfactor() >>> prostate[4] = prostate[4].asfactor() >>> prostate[5] = prostate[5].asfactor() >>> prostate[8] = prostate[8].asfactor() >>> predictors = ["AGE","RACE","DPROS","DCAPS","PSA","VOL","GLEASON"] >>> response = "CAPSULE" >>> hyper_params = {'alpha': [0.01,0.5], ... 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_params) >>> gs.train(x=predictors, y=response, training_frame=prostate) >>> gs.aic() 
 - 
auc(train=False, valid=False, xval=False)[source]¶
- Get the AUC(s). - If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”. - Parameters
- train (bool) – If train is True, then return the AUC value for the training data. 
- valid (bool) – If valid is True, then return the AUC value for the validation data. 
- xval (bool) – If xval is True, then return the AUC value for the validation data. 
 
- Returns
- The AUC. 
- Examples
 - >>> from h2o.estimators import H2OGradientBoostingEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> data = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv") >>> test = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_test_5k.csv") >>> x = data.columns >>> y = "response" >>> x.remove(y) >>> data[y] = data[y].asfactor() >>> test[y] = test[y].asfactor() >>> ss = data.split_frame(seed = 1) >>> train = ss[0] >>> valid = ss[1] >>> gbm_params1 = {'learn_rate': [0.01, 0.1], ... 'max_depth': [3, 5, 9], ... 'sample_rate': [0.8, 1.0], ... 'col_sample_rate': [0.2, 0.5, 1.0]} >>> gbm_grid1 = H2OGridSearch(model=H2OGradientBoostingEstimator, ... grid_id='gbm_grid1', ... hyper_params=gbm_params1) >>> gbm_grid1.train(x=x, y=y, ... training_frame=train, ... validation_frame=valid, ... ntrees=100, ... seed=1) >>> gbm_pridperf1 = gbm_grid1.get_grid(sort_by='auc', decreasing=True) >>> best_gbm1 = gbm_gridperf1.models[0] >>> best_gbm_perf1 = best_gbm1.model_performance(test) >>> best_gbm_perf1.auc() 
 - 
aucpr(train=False, valid=False, xval=False)[source]¶
- Get the aucPR (Area Under PRECISION RECALL Curve). - If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”. - Parameters
- train (bool) – If train is True, then return the aucpr value for the training data. 
- valid (bool) – If valid is True, then return the aucpr value for the validation data. 
- xval (bool) – If xval is True, then return the aucpr value for the validation data. 
 
- Returns
- The AUCPR for the models in this grid. 
 
 - 
biases(vector_id=0)[source]¶
- Return the frame for the respective bias vector. - Parameters
- vector_id – an integer, ranging from 0 to number of layers, that specifies the bias vector to return. 
- Returns
- an H2OFrame which represents the bias vector identified by vector_id 
- Examples
 - >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris.csv") >>> hh = H2ODeepLearningEstimator(hidden=[], ... loss="CrossEntropy", ... export_weights_and_biases=True) >>> hh.train(x=list(range(4)), y=4, training_frame=iris) >>> hh.biases(0) 
 - 
catoffsets()[source]¶
- Categorical offsets for one-hot encoding - Examples
 - >>> from h2o.estimators import H2ODeepLearningEstimator >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris.csv") >>> hh = H2ODeepLearningEstimator(hidden=[], ... loss="CrossEntropy", ... export_weights_and_biases=True) >>> hh.train(x=list(range(4)), y=4, training_frame=iris) >>> hh.catoffsets() 
 - 
coef()[source]¶
- Return the coefficients that can be applied to the non-standardized data. - Note: standardize = True by default. If set to False, then coef() returns the coefficients that are fit directly. - Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("https://h2o-public-test-data.s3.amazonaws.com/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], ... 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=range(3)+range(4,11), y=3, training_frame=training_data) >>> gs.coef() 
 - 
coef_norm()[source]¶
- Return coefficients fitted on the standardized data (requires standardize = True, which is on by default). These coefficients can be used to evaluate variable importance. - Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("https://h2o-public-test-data.s3.amazonaws.com/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], ... 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=range(3)+range(4,11), y=3, training_frame=training_data) >>> gs.coef_norm() 
 - 
deepfeatures(test_data, layer)[source]¶
- Obtain a hidden layer’s details on a dataset. - Parameters
- test_data – Data to create a feature space on. 
- layer (int) – Index of the hidden layer. 
 
- Returns
- A dictionary of hidden layer details for each model. 
- Examples
 - >>> from h2o.estimators import H2OAutoEncoderEstimator >>> resp = 784 >>> nfeatures = 20 >>> train = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/bigdata/laptop/mnist/train.csv.gz") >>> train[resp] = train[resp].asfactor() >>> test = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/bigdata/laptop/mnist/test.csv.gz") >>> test[resp] = test[resp].asfactor() >>> sid = train[0].runif(0) >>> train_unsup = train[sid >= 0.5] >>> train_unsup.pop(resp) >>> train_sup = train[sid < 0.5] >>> ae_model = H2OAutoEncoderEstimator(activation="Tanh", ... hidden=[nfeatures], ... model_id="ae_model", ... epochs=1, ... ignore_const_cols=False, ... reproducible=True, ... seed=1234) >>> ae_model.train(list(range(resp)), training_frame=train_unsup) >>> ae_model.deepfeatures(train_sup[0:resp], 0) 
 - 
property failed_params¶
- Return a list of failed parameters. :examples: - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("https://h2o-public-test-data.s3.amazonaws.com/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], ... 'lambda': [1e-5,1e-6], ... 'beta_epsilon': [0.05]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=range(3)+range(4,11), y=3, training_frame=training_data) >>> gs.failed_params 
 - 
get_grid(sort_by=None, decreasing=None)[source]¶
- Retrieve an H2OGridSearch instance. - Optionally specify a metric by which to sort models and a sort order. Note that if neither cross-validation nor a validation frame is used in the grid search, then the training metrics will display in the “get grid” output. If a validation frame is passed to the grid, and - nfolds = 0, then the validation metrics will display. However, if- nfolds> 1, then cross-validation metrics will display even if a validation frame is provided.- Parameters
- sort_by (str) – A metric by which to sort the models in the grid space. Choices are: - "logloss",- "residual_deviance",- "mse",- "auc",- "r2",- "accuracy",- "precision",- "recall",- "f1", etc.
- decreasing (bool) – Sort the models in decreasing order of metric if true, otherwise sort in increasing order (default). 
 
- Returns
- A new H2OGridSearch instance optionally sorted on the specified metric. 
- Examples
 - >>> from h2o.estimators import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> benign = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> y = 3 >>> x = [4,5,6,7,8,9,10,11] >>> hyper_params = {'alpha': [0.01,0.3,0.5], ... 'lambda': [1e-5, 1e-6, 1e-7]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_params) >>> gs.train(x=x,y=y, training_frame=benign) >>> gs.get_grid(sort_by='F1', decreasing=True) 
 - 
get_hyperparams(id, display=True)[source]¶
- Get the hyperparameters of a model explored by grid search. - Parameters
- id (str) – The model id of the model with hyperparameters of interest. 
- display (bool) – Flag to indicate whether to display the hyperparameter names. 
 
- Returns
- A list of the hyperparameters for the specified model. 
- Examples
 - >>> from h2o.estimators import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> benign = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> y = 3 >>> x = [4,5,6,7,8,9,10,11] >>> hyper_params = {'alpha': [0.01,0.3,0.5], ... 'lambda': [1e-5, 1e-6, 1e-7]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_params) >>> gs.train(x=x,y=y, training_frame=benign) >>> best_model_id = gs.get_grid(sort_by='F1', ... decreasing=True).model_ids[0] >>> gs.get_hyperparams(best_model_id) 
 - 
get_hyperparams_dict(id, display=True)[source]¶
- Derived and returned the model parameters used to train the particular grid search model. - Parameters
- id (str) – The model id of the model with hyperparameters of interest. 
- display (bool) – Flag to indicate whether to display the hyperparameter names. 
 
- Returns
- A dict of model pararmeters derived from the hyper-parameters used to train this particular model. 
- Examples
 - >>> from h2o.estimators import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> benign = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> y = 3 >>> x = [4,5,6,7,8,9,10,11] >>> hyper_params = {'alpha': [0.01,0.3,0.5], ... 'lambda': [1e-5, 1e-6, 1e-7]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_params) >>> gs.train(x=x,y=y, training_frame=benign) >>> best_model_id = gs.get_grid(sort_by='F1', ... decreasing=True).model_ids[0] >>> gs.get_hyperparams_dict(best_model_id) 
 - 
get_xval_models(key=None)[source]¶
- Return a Model object. - Parameters
- key (str) – If None, return all cross-validated models; otherwise return the model specified by the key. 
- Returns
- A model or a list of models. 
- Examples
 - >>> from h2o.estimators import H2OGradientBoostingEstimator >>> fr = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/prostate_train.csv") >>> m = H2OGradientBoostingEstimator(nfolds=10, ... ntrees=10, ... keep_cross_validation_models=True) >>> m.train(x=list(range(2,fr.ncol)), y=1, training_frame=fr) >>> m.get_xval_models() 
 - 
gini(train=False, valid=False, xval=False)[source]¶
- Get the Gini Coefficient(s). - If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”. - Parameters
- train (bool) – If train is True, then return the Gini Coefficient value for the training data. 
- valid (bool) – If valid is True, then return the Gini Coefficient value for the validation data. 
- xval (bool) – If xval is True, then return the Gini Coefficient value for the cross validation data. 
 
- Returns
- The Gini Coefficient for the models in this grid. 
- Examples
 - >>> from h2o.estimators import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> benign = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> y = 3 >>> x = [4,5,6,7,8,9,10,11] >>> hyper_params = {'alpha': [0.01,0.3,0.5], ... 'lambda': [1e-5, 1e-6, 1e-7]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_params) >>> gs.train(x=x,y=y, training_frame=benign) >>> gs.gini() 
 - 
property grid_id¶
- A key that identifies this grid search object in H2O. - Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("https://h2o-public-test-data.s3.amazonaws.com/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], ... 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=range(3)+range(4,11), y=3, training_frame=training_data) >>> gs.grid_id 
 - 
property hyper_names¶
- Return the hyperparameter names. - Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("https://h2o-public-test-data.s3.amazonaws.com/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], ... 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=range(3)+range(4,11), y=3, training_frame=training_data) >>> gs.hyper_names 
 - 
is_cross_validated()[source]¶
- Return True if the model was cross-validated. - Examples
 - >>> from h2o.estimators import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> benign = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> y = 3 >>> x = [4,5,6,7,8,9,10,11] >>> hyper_params = {'alpha': [0.01,0.3,0.5], ... 'lambda': [1e-5, 1e-6, 1e-7]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_params) >>> gs.train(x=x,y=y, training_frame=benign) >>> gs.is_cross_validated() 
 - 
join()[source]¶
- Wait until grid finishes computing. - Examples
 - >>> from h2o.estimators import H2ODeepLearningEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> insurance = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/insurance.csv") >>> insurance["offset"] = insurance["Holders"].log() >>> insurance["Group"] = insurance["Group"].asfactor() >>> insurance["Age"] = insurance["Age"].asfactor() >>> insurance["District"] = insurance["District"].asfactor() >>> hyper_params = {'huber_alpha': [0.2,0.5], ... 'quantile_alpha': [0.2,0.6]} >>> gs = H2OGridSearch(H2ODeepLearningEstimator(epochs=5), hyper_params) >>> gs.start(x=list(range(3)),y="Claims", training_frame=insurance) >>> gs.join() 
 - 
property key¶
- Returns
- the unique key representing the object on the backend 
 
 - 
logloss(train=False, valid=False, xval=False)[source]¶
- Get the Log Loss(s). - If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”. - Parameters
- train (bool) – If train is True, then return the Log Loss value for the training data. 
- valid (bool) – If valid is True, then return the Log Loss value for the validation data. 
- xval (bool) – If xval is True, then return the Log Loss value for the cross validation data. 
 
- Returns
- The Log Loss for this binomial model. 
- Examples
 - >>> from h2o.estimators import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> benign = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> y = 3 >>> x = [4,5,6,7,8,9,10,11] >>> hyper_params = {'alpha': [0.01,0.3,0.5], ... 'lambda': [1e-5, 1e-6, 1e-7]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_params) >>> gs.train(x=x,y=y, training_frame=benign) >>> gs.logloss() 
 - 
mean_residual_deviance(train=False, valid=False, xval=False)[source]¶
- Get the Mean Residual Deviances(s). - If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”. - Parameters
- train (bool) – If train is True, then return the Mean Residual Deviance value for the training data. 
- valid (bool) – If valid is True, then return the Mean Residual Deviance value for the validation data. 
- xval (bool) – If xval is True, then return the Mean Residual Deviance value for the cross validation data. 
 
- Returns
- The Mean Residual Deviance for this regression model. 
- Examples
 - >>> from h2o.estimators import H2ODeepLearningEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> insurance = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/insurance.csv") >>> insurance["offset"] = insurance["Holders"].log() >>> insurance["Group"] = insurance["Group"].asfactor() >>> insurance["Age"] = insurance["Age"].asfactor() >>> insurance["District"] = insurance["District"].asfactor() >>> hyper_params = {'huber_alpha': [0.2,0.5], ... 'quantile_alpha': [0.2,0.6]} >>> gs = H2OGridSearch(H2ODeepLearningEstimator(epochs=5), ... hyper_params) >>> gs.train(x=list(range(3)),y="Claims", training_frame=insurance) >>> gs.mean_residual_deviance() 
 - 
property model_ids¶
- Returns model ids. - Examples
 - >>> from h2o.grid.grid_search import H2OGridSearch >>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator >>> training_data = h2o.import_file("https://h2o-public-test-data.s3.amazonaws.com/smalldata/logreg/benign.csv") >>> hyper_parameters = {'alpha': [0.01,0.5], ... 'lambda': [1e-5,1e-6]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_parameters) >>> gs.train(x=range(3)+range(4,11), y=3, training_frame=training_data) >>> gs.model_ids 
 - 
model_performance(test_data=None, train=False, valid=False, xval=False)[source]¶
- Generate model metrics for this model on test_data. - Parameters
- test_data – Data set for which model metrics shall be computed against. All three of train, valid and xval arguments are ignored if test_data is not None. 
- train – Report the training metrics for the model. 
- valid – Report the validation metrics for the model. 
- xval – Report the validation metrics for the model. 
 
- Returns
- An instance of - MetricsBaseor one of its subclass.
- Examples
 - >>> from h2o.estimators import H2OGradientBoostingEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> data = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv") >>> test = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_test_5k.csv") >>> x = data.columns >>> y = "response" >>> x.remove(y) >>> data[y] = data[y].asfactor() >>> test[y] = test[y].asfactor() >>> ss = data.split_frame(seed = 1) >>> train = ss[0] >>> valid = ss[1] >>> gbm_params1 = {'learn_rate': [0.01, 0.1], ... 'max_depth': [3, 5, 9], ... 'sample_rate': [0.8, 1.0], ... 'col_sample_rate': [0.2, 0.5, 1.0]} >>> gbm_grid1 = H2OGridSearch(model=H2OGradientBoostingEstimator, ... grid_id='gbm_grid1', ... hyper_params=gbm_params1) >>> gbm_grid1.train(x=x, y=y, ... training_frame=train, ... validation_frame=valid, ... ntrees=100, ... seed=1) >>> gbm_gridperf1 = gbm_grid1.get_grid(sort_by='auc', decreasing=True) >>> best_gbm1 = gbm_gridperf1.models[0] >>> best_gbm1.model_performance(test) 
 - 
mse(train=False, valid=False, xval=False)[source]¶
- Get the MSE(s). - If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”. - Parameters
- train (bool) – If train is True, then return the MSE value for the training data. 
- valid (bool) – If valid is True, then return the MSE value for the validation data. 
- xval (bool) – If xval is True, then return the MSE value for the cross validation data. 
 
- Returns
- The MSE for this regression model. 
- Examples
 - >>> from h2o.estimators import H2ODeepLearningEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> insurance = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/insurance.csv") >>> insurance["offset"] = insurance["Holders"].log() >>> insurance["Group"] = insurance["Group"].asfactor() >>> insurance["Age"] = insurance["Age"].asfactor() >>> insurance["District"] = insurance["District"].asfactor() >>> hyper_params = {'huber_alpha': [0.2,0.5], ... 'quantile_alpha': [0.2,0.6]} >>> from h2o.estimators import H2ODeepLearningEstimator >>> gs = H2OGridSearch(H2ODeepLearningEstimator(epochs=5), ... hyper_params) >>> gs.train(x=list(range(3)),y="Claims", training_frame=insurance) >>> gs.mse() 
 - 
normmul()[source]¶
- Normalization/Standardization multipliers for numeric predictors. - Examples
 - >>> from h2o.estimators import H2ODeepLearningEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> insurance = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/insurance.csv") >>> insurance["offset"] = insurance["Holders"].log() >>> insurance["Group"] = insurance["Group"].asfactor() >>> insurance["Age"] = insurance["Age"].asfactor() >>> insurance["District"] = insurance["District"].asfactor() >>> hyper_params = {'huber_alpha': [0.2,0.5], ... 'quantile_alpha': [0.2,0.6]} >>> from h2o.estimators import H2ODeepLearningEstimator >>> gs = H2OGridSearch(H2ODeepLearningEstimator(epochs=5), ... hyper_params) >>> gs.train(x=list(range(3)),y="Claims", training_frame=insurance) >>> gs.normmul() 
 - 
normsub()[source]¶
- Normalization/Standardization offsets for numeric predictors. - Examples
 - >>> from h2o.estimators import H2ODeepLearningEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> insurance = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/insurance.csv") >>> insurance["offset"] = insurance["Holders"].log() >>> insurance["Group"] = insurance["Group"].asfactor() >>> insurance["Age"] = insurance["Age"].asfactor() >>> insurance["District"] = insurance["District"].asfactor() >>> hyper_params = {'huber_alpha': [0.2,0.5], ... 'quantile_alpha': [0.2,0.6]} >>> from h2o.estimators import H2ODeepLearningEstimator >>> gs = H2OGridSearch(H2ODeepLearningEstimator(epochs=5), ... hyper_params) >>> gs.train(x=list(range(3)),y="Claims", training_frame=insurance) >>> gs.normsub() 
 - 
null_degrees_of_freedom(train=False, valid=False, xval=False)[source]¶
- Retreive the null degress of freedom if this model has the attribute, or None otherwise. - Parameters
- train (bool) – Get the null dof for the training set. If both train and valid are False, then train is selected by default. 
- valid (bool) – Get the null dof for the validation set. If both train and valid are True, then train is selected by default. 
- xval (bool) – Get the null dof for the cross-validated models. 
 
- Returns
- the null dof, or None if it is not present. 
- Examples
 - >>> from h2o.estimators import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> benign = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> y = 3 >>> x = [4,5,6,7,8,9,10,11] >>> hyper_params = {'alpha': [0.01,0.3,0.5], ... 'lambda': [1e-5, 1e-6, 1e-7]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_params) >>> gs.train(x=x,y=y, training_frame=benign) >>> gs.null_degrees_of_freedom() 
 - 
null_deviance(train=False, valid=False, xval=False)[source]¶
- Retreive the null deviance if this model has the attribute, or None otherwise. - Parameters
- train (bool) – Get the null deviance for the training set. If both train and valid are False, then train is selected by default. 
- valid (bool) – Get the null deviance for the validation set. If both train and valid are True, then train is selected by default. 
- xval (bool) – Get the null deviance for the cross-validated models. 
 
- Returns
- the null deviance, or None if it is not present. 
- Examples
 - >>> from h2o.estimators import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> benign = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> y = 3 >>> x = [4,5,6,7,8,9,10,11] >>> hyper_params = {'alpha': [0.01,0.3,0.5], ... 'lambda': [1e-5, 1e-6, 1e-7]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_params) >>> gs.train(x=x,y=y, training_frame=benign) >>> gs.null_deviance() 
 - 
pareto_front(test_frame, x_metric=None, y_metric=None, **kwargs)[source]¶
- Create Pareto front and plot it. Pareto front contains models that are optimal in a sense that for each model in the Pareto front there isn’t a model that would be better in both criteria. For example, this can be useful in picking models that are fast to predict and at the same time have high accuracy. For generic data.frames/H2OFrames input the task is assumed to be minimization for both metrics. - Parameters
- test_frame – a frame used to generate the metrics 
- x_metric – metric present in the leaderboard 
- y_metric – metric present in the leaderboard 
- kwargs – key, value mappings Other keyword arguments are passed through to - h2o.explanation.pareto_front().
 
- Returns
- object that contains the resulting figure (can be accessed using - result.figure())
- Examples
 - >>> import h2o >>> from h2o.automl import H2OAutoML >>> from h2o.estimators import H2OGradientBoostingEstimator >>> from h2o.grid import H2OGridSearch >>> >>> h2o.connect() >>> >>> # Import the wine dataset into H2O: >>> df = h2o.import_file("h2o://prostate.csv") >>> >>> # Set the response >>> response = "CAPSULE" >>> df[response] = df[response].asfactor() >>> >>> >>> # Split the dataset into a train and test set: >>> train, test = df.split_frame([0.8]) >>> >>> gbm_params1 = {'learn_rate': [0.01, 0.1], >>> 'max_depth': [3, 5, 9]} >>> grid = H2OGridSearch(model=H2OGradientBoostingEstimator, >>> hyper_params=gbm_params1) >>> grid.train(y=response, training_frame=train) >>> >>> # Create the Pareto front >>> pf = grid.pareto_front(test) >>> pf.figure() # get the Pareto front plot >>> pf # H2OFrame containing the Pareto front subset of the leaderboard 
 - 
pprint_coef()[source]¶
- Pretty print the coefficents table (includes normalized coefficients). - Examples
 - >>> from h2o.estimators import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> benign = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> y = 3 >>> x = [4,5,6,7,8,9,10,11] >>> hyper_params = {'alpha': [0.01,0.3,0.5], ... 'lambda': [1e-5, 1e-6, 1e-7]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_params) >>> gs.train(x=x,y=y, training_frame=benign) >>> gs.pprint_coef() 
 - 
predict(test_data)[source]¶
- Predict on a dataset. - Parameters
- test_data (H2OFrame) – Data to be predicted on. 
- Returns
- H2OFrame filled with predictions. 
- Examples
 - >>> from h2o.estimators import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> benign = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> y = 3 >>> x = [4,5,6,7,8,9,10,11] >>> hyper_params = {'alpha': [0.01,0.3,0.5], ... 'lambda': [1e-5, 1e-6, 1e-7]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_params) >>> gs.train(x=x,y=y, training_frame=benign) >>> gs.predict(benign) 
 - 
r2(train=False, valid=False, xval=False)[source]¶
- Return the R^2 for this regression model. - The R^2 value is defined to be - 1 - MSE/var, where- varis computed as- sigma^2.- If all are False (default), then return the training metric value. If more than one options is set to True, then return a dictionary of metrics where the keys are “train”, “valid”, and “xval”. - Parameters
- train (bool) – If train is True, then return the R^2 value for the training data. 
- valid (bool) – If valid is True, then return the R^2 value for the validation data. 
- xval (bool) – If xval is True, then return the R^2 value for the cross validation data. 
 
- Returns
- The R^2 for this regression model. 
- Examples
 - >>> from h2o.estimators import H2ODeepLearningEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> insurance = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/insurance.csv") >>> insurance["offset"] = insurance["Holders"].log() >>> insurance["Group"] = insurance["Group"].asfactor() >>> insurance["Age"] = insurance["Age"].asfactor() >>> insurance["District"] = insurance["District"].asfactor() >>> hyper_params = {'huber_alpha': [0.2,0.5], ... 'quantile_alpha': [0.2,0.6]} >>> from h2o.estimators import H2ODeepLearningEstimator >>> gs = H2OGridSearch(H2ODeepLearningEstimator(epochs=5), ... hyper_params) >>> gs.train(x=list(range(3)),y="Claims", training_frame=insurance) >>> gs.r2() 
 - 
residual_degrees_of_freedom(train=False, valid=False, xval=False)[source]¶
- Retreive the residual degress of freedom if this model has the attribute, or None otherwise. - Parameters
- train (bool) – Get the residual dof for the training set. If both train and valid are False, then train is selected by default. 
- valid (bool) – Get the residual dof for the validation set. If both train and valid are True, then train is selected by default. 
- xval (bool) – Get the residual dof for the cross-validated models. 
 
- Returns
- the residual degrees of freedom, or None if they are not present. 
- Examples
 - >>> from h2o.estimators import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> benign = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> y = 3 >>> x = [4,5,6,7,8,9,10,11] >>> hyper_params = {'alpha': [0.01,0.3,0.5], ... 'lambda': [1e-5, 1e-6, 1e-7]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_params) >>> gs.train(x=x,y=y, training_frame=benign) >>> gs.residual_degrees_of_freedom() 
 - 
residual_deviance(train=False, valid=False, xval=False)[source]¶
- Retreive the residual deviance if this model has the attribute, or None otherwise. - Parameters
- train (bool) – Get the residual deviance for the training set. If both train and valid are False, then train is selected by default. 
- valid (bool) – Get the residual deviance for the validation set. If both train and valid are True, then train is selected by default. 
- xval (bool) – Get the residual deviance for the cross-validated models. 
 
- Returns
- the residual deviance, or None if it is not present. 
- Examples
 - >>> from h2o.estimators import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> benign = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> y = 3 >>> x = [4,5,6,7,8,9,10,11] >>> hyper_params = {'alpha': [0.01,0.3,0.5], ... 'lambda': [1e-5, 1e-6, 1e-7]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_params) >>> gs.train(x=x,y=y, training_frame=benign) >>> gs.residual_deviance() 
 - 
respmul()[source]¶
- Normalization/Standardization multipliers for numeric response. - Examples
 - >>> from h2o.estimators import H2ODeepLearningEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> insurance = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/insurance.csv") >>> insurance["offset"] = insurance["Holders"].log() >>> insurance["Group"] = insurance["Group"].asfactor() >>> insurance["Age"] = insurance["Age"].asfactor() >>> insurance["District"] = insurance["District"].asfactor() >>> hyper_params = {'huber_alpha': [0.2,0.5], ... 'quantile_alpha': [0.2,0.6]} >>> from h2o.estimators import H2ODeepLearningEstimator >>> gs = H2OGridSearch(H2ODeepLearningEstimator(epochs=5), ... hyper_params) >>> gs.train(x=list(range(3)),y="Claims", training_frame=insurance) >>> gs.respmul() 
 - 
respsub()[source]¶
- Normalization/Standardization offsets for numeric response. - Examples
 - >>> from h2o.estimators import H2ODeepLearningEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> insurance = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/insurance.csv") >>> insurance["offset"] = insurance["Holders"].log() >>> insurance["Group"] = insurance["Group"].asfactor() >>> insurance["Age"] = insurance["Age"].asfactor() >>> insurance["District"] = insurance["District"].asfactor() >>> hyper_params = {'huber_alpha': [0.2,0.5], ... 'quantile_alpha': [0.2,0.6]} >>> from h2o.estimators import H2ODeepLearningEstimator >>> gs = H2OGridSearch(H2ODeepLearningEstimator(epochs=5), ... hyper_params) >>> gs.train(x=list(range(3)),y="Claims", training_frame=insurance) >>> gs.respsub() 
 - 
resume(recovery_dir=None, **kwargs)[source]¶
- Resume previously stopped grid training. - Parameters
- recovery_dir – When specified, the grid and all necessary data (frames, models) will be saved to this directory (use HDFS or other distributed file-system). Should the cluster crash during training, the grid can be reloaded from this directory via - h2o.load_grid, and training can be resumed.
 
 - 
scoring_history()[source]¶
- Retrieve model scoring history. - Returns
- Score history (H2OTwoDimTable) 
- Examples
 - >>> from h2o.estimators import H2ODeepLearningEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> insurance = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/insurance.csv") >>> insurance["offset"] = insurance["Holders"].log() >>> insurance["Group"] = insurance["Group"].asfactor() >>> insurance["Age"] = insurance["Age"].asfactor() >>> insurance["District"] = insurance["District"].asfactor() >>> hyper_params = {'huber_alpha': [0.2,0.5], ... 'quantile_alpha': [0.2,0.6]} >>> from h2o.estimators import H2ODeepLearningEstimator >>> gs = H2OGridSearch(H2ODeepLearningEstimator(epochs=5), ... hyper_params) >>> gs.train(x=list(range(3)),y="Claims", training_frame=insurance) >>> gs.scoring_history() 
 - 
show(verbosity=None, fmt=None)[source]¶
- Renders all models in the grid, sorted by performance metric. - Examples
 - >>> from h2o.estimators import H2ODeepLearningEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> insurance = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/insurance.csv") >>> insurance["offset"] = insurance["Holders"].log() >>> insurance["Group"] = insurance["Group"].asfactor() >>> insurance["Age"] = insurance["Age"].asfactor() >>> insurance["District"] = insurance["District"].asfactor() >>> hyper_params = {'huber_alpha': [0.2,0.5], ... 'quantile_alpha': [0.2,0.6]} >>> from h2o.estimators import H2ODeepLearningEstimator >>> gs = H2OGridSearch(H2ODeepLearningEstimator(epochs=5), ... hyper_params) >>> gs.train(x=list(range(3)),y="Claims", training_frame=insurance) >>> gs.show() 
 - 
show_summary()[source]¶
- Renders a detailed summary of the explored models. - Examples
 - >>> from h2o.estimators import H2ODeepLearningEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> insurance = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/insurance.csv") >>> insurance["offset"] = insurance["Holders"].log() >>> insurance["Group"] = insurance["Group"].asfactor() >>> insurance["Age"] = insurance["Age"].asfactor() >>> insurance["District"] = insurance["District"].asfactor() >>> hyper_params = {'huber_alpha': [0.2,0.5], ... 'quantile_alpha': [0.2,0.6]} >>> from h2o.estimators import H2ODeepLearningEstimator >>> gs = H2OGridSearch(H2ODeepLearningEstimator(epochs=5), ... hyper_params) >>> gs.train(x=list(range(3)),y="Claims", training_frame=insurance) >>> gs.show_summary() 
 - 
sort_by(metric, increasing=True)[source]¶
- grid.sort_by() is deprecated; use grid.get_grid() instead - Deprecated since 2016-12-12, use grid.get_grid() instead. 
 - 
sorted_metric_table(use_pandas=True)[source]¶
- Retrieve summary table of an H2O Grid Search. - Parameters
- use_pandas – if True and if pandas is available, return the table as a Pandas DataFrame 
- Returns
- The summary table as an H2OTwoDimTable (or a Pandas DataFrame if use_pandas is True). 
- Examples
 - >>> from h2o.estimators import H2ODeepLearningEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> insurance = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/insurance.csv") >>> insurance["offset"] = insurance["Holders"].log() >>> insurance["Group"] = insurance["Group"].asfactor() >>> insurance["Age"] = insurance["Age"].asfactor() >>> insurance["District"] = insurance["District"].asfactor() >>> hyper_params = {'huber_alpha': [0.2,0.5], ... 'quantile_alpha': [0.2,0.6]} >>> from h2o.estimators import H2ODeepLearningEstimator >>> gs = H2OGridSearch(H2ODeepLearningEstimator(epochs=5), ... hyper_params) >>> gs.train(x=list(range(3)),y="Claims", training_frame=insurance) >>> gs.sorted_metric_table() 
 - 
start(x, y=None, training_frame=None, offset_column=None, fold_column=None, weights_column=None, validation_frame=None, **params)[source]¶
- Asynchronous model build by specifying the predictor columns, response column, and any additional frame-specific values. - To block for results, call - join().- Parameters
- x – A list of column names or indices indicating the predictor columns. 
- y – An index or a column name indicating the response column. 
- training_frame – The H2OFrame having the columns indicated by x and y (as well as any additional columns specified by fold, offset, and weights). 
- offset_column – The name or index of the column in training_frame that holds the offsets. 
- fold_column – The name or index of the column in training_frame that holds the per-row fold assignments. 
- weights_column – The name or index of the column in training_frame that holds the per-row weights. 
- validation_frame – H2OFrame with validation data to be scored on while training. 
 
- Examples
 - >>> from h2o.estimators import H2ODeepLearningEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> insurance = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/insurance.csv") >>> insurance["offset"] = insurance["Holders"].log() >>> insurance["Group"] = insurance["Group"].asfactor() >>> insurance["Age"] = insurance["Age"].asfactor() >>> insurance["District"] = insurance["District"].asfactor() >>> hyper_params = {'huber_alpha': [0.2,0.5], ... 'quantile_alpha': [0.2,0.6]} >>> gs = H2OGridSearch(H2ODeepLearningEstimator(epochs=5), hyper_params) >>> gs.start(x=list(range(3)),y="Claims", training_frame=insurance) >>> gs.join() 
 - 
train(x=None, y=None, training_frame=None, offset_column=None, fold_column=None, weights_column=None, validation_frame=None, **params)[source]¶
- Train the model synchronously (i.e. do not return until the model finishes training). - To train asynchronously call - start().- Parameters
- x – A list of column names or indices indicating the predictor columns. 
- y – An index or a column name indicating the response column. 
- training_frame – The H2OFrame having the columns indicated by x and y (as well as any additional columns specified by fold, offset, and weights). 
- offset_column – The name or index of the column in training_frame that holds the offsets. 
- fold_column – The name or index of the column in training_frame that holds the per-row fold assignments. 
- weights_column – The name or index of the column in training_frame that holds the per-row weights. 
- validation_frame – H2OFrame with validation data to be scored on while training. 
 
- Examples
 - >>> from h2o.estimators import H2ODeepLearningEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> insurance = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/insurance.csv") >>> insurance["offset"] = insurance["Holders"].log() >>> insurance["Group"] = insurance["Group"].asfactor() >>> insurance["Age"] = insurance["Age"].asfactor() >>> insurance["District"] = insurance["District"].asfactor() >>> hyper_params = {'huber_alpha': [0.2,0.5], ... 'quantile_alpha': [0.2,0.6]} >>> from h2o.estimators import H2ODeepLearningEstimator >>> gs = H2OGridSearch(H2ODeepLearningEstimator(epochs=5), ... hyper_params) >>> gs.train(x=list(range(3)),y="Claims", training_frame=insurance) 
 - 
varimp(use_pandas=False)[source]¶
- Return the variable importances as a list/pandas DataFrame. - Parameters
- use_pandas (bool) – If True, then the variable importances will be returned as a pandas data frame. 
- Returns
- A dictionary of lists or Pandas DataFrame instances. 
- Examples
 - >>> from h2o.estimators import H2ODeepLearningEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> insurance = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/glm_test/insurance.csv") >>> insurance["offset"] = insurance["Holders"].log() >>> insurance["Group"] = insurance["Group"].asfactor() >>> insurance["Age"] = insurance["Age"].asfactor() >>> insurance["District"] = insurance["District"].asfactor() >>> hyper_params = {'huber_alpha': [0.2,0.5], ... 'quantile_alpha': [0.2,0.6]} >>> from h2o.estimators import H2ODeepLearningEstimator >>> gs = H2OGridSearch(H2ODeepLearningEstimator(epochs=5), ... hyper_params) >>> gs.train(x=list(range(3)),y="Claims", training_frame=insurance) >>> gs.varimp(use_pandas=True) 
 - 
weights(matrix_id=0)[source]¶
- Return the frame for the respective weight matrix. - Param
- matrix_id: an integer, ranging from 0 to number of layers, that specifies the weight matrix to return. 
- Returns
- an H2OFrame which represents the weight matrix identified by matrix_id 
- Examples
 - >>> from h2o.estimators import H2ODeepLearningEstimator >>> iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris.csv") >>> hh = H2ODeepLearningEstimator(hidden=[], ... loss="CrossEntropy", ... export_weights_and_biases=True) >>> hh.train(x=list(range(4)), y=4, training_frame=iris) >>> hh.weights(0) 
 - 
xval_keys()[source]¶
- Model keys for the cross-validated model. - Examples
 - >>> from h2o.estimators import H2OGeneralizedLinearEstimator >>> from h2o.grid.grid_search import H2OGridSearch >>> benign = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/logreg/benign.csv") >>> y = 3 >>> x = [4,5,6,7,8,9,10,11] >>> hyper_params = {'alpha': [0.01,0.3,0.5], ... 'lambda': [1e-5, 1e-6, 1e-7]} >>> gs = H2OGridSearch(H2OGeneralizedLinearEstimator(family='binomial'), ... hyper_params) >>> gs.train(x=x,y=y, training_frame=benign) >>> gs.xval_keys() 
 
-