3.18.0.9
  • Welcome to H2O 3
  • Quick Start Videos
  • Cloud Integration
  • Downloading & Installing H2O
  • Starting H2O
  • Getting Data into Your H2O Cluster
  • Data Manipulation
  • Algorithms
  • Cross-Validation
  • Grid (Hyperparameter) Search
  • AutoML: Automatic Machine Learning
  • Saving and Loading a Model
  • Productionizing H2O
  • Using Flow - H2O’s Web UI
  • Downloading Logs
  • H2O Architecture
  • Security
  • FAQ
  • Glossary
  • Migrating to H2O 3
  • Appendix A - Parameters
    • alpha
    • balance_classes
    • beta_constraints
    • beta_epsilon
    • binomial_double_trees
    • build_tree_one_node
    • calibrate_frame
    • calibrate_model
    • categorical_encoding
    • checkpoint
    • class_sampling_factors
    • col_sample_rate
    • col_sample_rate_change_per_level
    • col_sample_rate_per_tree
    • compute_metrics
    • compute_p_values
    • distribution
    • early_stopping
    • eps_prob
    • eps_sdev
    • estimate_k
    • family
    • fold_assignment
    • fold_column
    • gradient_epsilon
    • histogram_type
    • huber_alpha
    • ignore_const_cols
    • ignored_columns
    • impute_missing
    • init
    • interaction_pairs
    • interactions
    • intercept
    • k
    • keep_cross_validation_fold_assignment
    • keep_cross_validation_predictions
    • lambda
    • lambda_min_ratio
    • lambda_search
    • laplace
    • learn_rate
    • learn_rate_annealing
    • link
      • Description
      • Related Parameters
      • Example
    • max_abs_leafnode_pred
    • max_active_predictors
    • max_after_balance_size
    • max_depth
    • max_hit_ratio_k
    • max_iterations
    • max_runtime_secs
    • metalearner_algorithm
    • metalearner_params
    • min_prob
    • min_rows
    • min_sdev
    • min_split_improvement
    • missing_values_handling
    • model_id
    • mtries
    • nbins
    • nbins_cats
    • nbins_top_level
    • nfolds
    • nlambdas
    • non_negative
    • ntrees
    • objective_epsilon
    • offset_column
    • pca_method
    • pred_noise_bandwidth
    • prior
    • quantile_alpha
    • remove_collinear_columns
    • sample_rate
    • sample_rate_per_class
    • score_each_iteration
    • score_tree_interval
    • seed
    • solver
    • standardize
    • stopping_metric
    • stopping_rounds
    • stopping_tolerance
    • training_frame
    • transform
    • tweedie_link_power
    • tweedie_power
    • tweedie_variance_power
    • use_all_factor_levels
    • user_points
    • validation_frame
    • weights_column
    • x
    • y
  • Appendix B - API Reference
H2O
  • Docs »
  • Appendix A - Parameters »
  • link
  • View page source

link¶

  • Available in: GLM
  • Hyperparameter: no

Description¶

GLM problems consist of three main components:

  • A random component \(f\) for the dependent variable \(y\): The density function \(f(y;\theta,\phi)\) has a probability distribution from the exponential family parametrized by \(\theta\) and \(\phi\). This removes the restriction on the distribution of the error and allows for non-homogeneity of the variance with respect to the mean vector.
  • A systematic component (linear model) \(\eta\): \(\eta = X\beta\), where \(X\) is the matrix of all observation vectors \(x_i\).
  • A link function \(g\): \(E(y) = \mu = {g^-1}(\eta)\) relates the expected value of the response \(\mu\) to the linear component \(\eta\). The link function can be any monotonic differentiable function. This relaxes the constraints on the additivity of the covariates, and it allows the response to belong to a restricted range of values depending on the chosen transformation \(g\).

Accordingly, in order to specify a GLM problem, you must choose a family function \(f\), link function \(g\), and any parameters needed to train the model.

H2O’s GLM supports the following link functions: Family_Default, Identity, Logit, Log, Inverse, Tweedie, Ologit, Oprobit, and Ologlog.

The following table describes the allowed Family/Link combinations.

Family Link Function
  Family_Default Identity Logit Log Inverse Tweedie Ologit Oprobit Ologlog
Binomial X   X            
Quasibinomial X   X            
Multinomial X                
Ordinal X           X X X
Gaussian X X   X X        
Poisson X X   X          
Gamma X X   X X        
Tweedie X         X      

Refer to the Links section for more information.

Related Parameters¶

  • family

Example¶

library(h2o)
h2o.init()

# import the iris dataset:
# this dataset is used to classify the type of iris plant
# the original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Iris
iris <- h2o.importFile("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv")

# convert response column to a factor
iris['class'] <- as.factor(iris['class'])

# set the predictor names and the response column name
predictors <- colnames(iris)[-length(iris)]
response <- 'class'

# split into train and validation
iris.splits <- h2o.splitFrame(data = iris, ratios = .8)
train <- iris.splits[[1]]
valid <- iris.splits[[2]]

# try using the `link` parameter:
iris_glm <- h2o.glm(x = predictors, y = response, family = 'multinomial', link = 'family_default',
                   training_frame = train, validation_frame = valid)

# print the logloss for the validation data
print(h2o.logloss(iris_glm, valid = TRUE))
import h2o
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
h2o.init()

# import the iris dataset:
# this dataset is used to classify the type of iris plant
# the original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Iris
iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv")

# convert response column to a factor
iris['class'] = iris['class'].asfactor()

# set the predictor names and the response column name
predictors = iris.columns[:-1]
response = 'class'

# split into train and validation sets
train, valid = iris.split_frame(ratios = [.8])

# try using the `link` parameter:
# Initialize and train a GLM
iris_glm = H2OGeneralizedLinearEstimator(family = 'multinomial', link = 'family_default')
iris_glm.train(x = predictors, y = response, training_frame = train, validation_frame = valid)

# print the logloss for the validation data
iris_glm.logloss(valid = True)
Next Previous

© Copyright 2016-2017 H2O.ai. Last updated on May 11, 2018.

Built with Sphinx using a theme provided by Read the Docs.