``link`` -------- - Available in: GLM - Hyperparameter: no Description ~~~~~~~~~~~ GLM problems consist of three main components: - A random component :math:`f` for the dependent variable :math:`y`: The density function :math:`f(y;\theta,\phi)` has a probability distribution from the exponential family parametrized by :math:`\theta` and :math:`\phi`. This removes the restriction on the distribution of the error and allows for non-homogeneity of the variance with respect to the mean vector. - A systematic component (linear model) :math:`\eta`: :math:`\eta = X\beta`, where :math:`X` is the matrix of all observation vectors :math:`x_i`. - A link function :math:`g`: :math:`E(y) = \mu = {g^-1}(\eta)` relates the expected value of the response :math:`\mu` to the linear component :math:`\eta`. The link function can be any monotonic differentiable function. This relaxes the constraints on the additivity of the covariates, and it allows the response to belong to a restricted range of values depending on the chosen transformation :math:`g`. Accordingly, in order to specify a GLM problem, you must choose a family function :math:`f`, link function :math:`g`, and any parameters needed to train the model. H2O's GLM supports the following link functions: Family_Default, Identity, Logit, Log, Inverse, and Tweedie. The following table describes the allowed Family/Link combinations. +----------------+-------------------------------------------------------------+ | **Family** | **Link Function** | +----------------+----------------+----------+-------+-----+---------+---------+ | | Family_Default | Identity | Logit | Log | Inverse | Tweedie | +----------------+----------------+----------+-------+-----+---------+---------+ | Binomial | X | | X | | | | +----------------+----------------+----------+-------+-----+---------+---------+ | Quasibinomial | X | | X | | | | +----------------+----------------+----------+-------+-----+---------+---------+ | Multinomial | X | | | | | | +----------------+----------------+----------+-------+-----+---------+---------+ | Gaussian | X | X | | X | X | | +----------------+----------------+----------+-------+-----+---------+---------+ | Poisson | X | X | | X | | | +----------------+----------------+----------+-------+-----+---------+---------+ | Gamma | X | X | | X | X | | +----------------+----------------+----------+-------+-----+---------+---------+ | Tweedie | X | | | | | X | +----------------+----------------+----------+-------+-----+---------+---------+ Refer to the `Links <../glm.html#links>`__ section for more information. Related Parameters ~~~~~~~~~~~~~~~~~~ - `family `__ Example ~~~~~~~ .. example-code:: .. code-block:: r library(h2o) h2o.init() # import the iris dataset: # this dataset is used to classify the type of iris plant # the original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Iris iris <- h2o.importFile("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") # convert response column to a factor iris['class'] <- as.factor(iris['class']) # set the predictor names and the response column name predictors <- colnames(iris)[-length(iris)] response <- 'class' # split into train and validation iris.splits <- h2o.splitFrame(data = iris, ratios = .8) train <- iris.splits[[1]] valid <- iris.splits[[2]] # try using the `link` parameter: iris_glm <- h2o.glm(x = predictors, y = response, family = 'multinomial', link = 'family_default', training_frame = train, validation_frame = valid) # print the logloss for the validation data print(h2o.logloss(iris_glm, valid = TRUE)) .. code-block:: python import h2o from h2o.estimators.glm import H2OGeneralizedLinearEstimator h2o.init() # import the iris dataset: # this dataset is used to classify the type of iris plant # the original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Iris iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") # convert response column to a factor iris['class'] = iris['class'].asfactor() # set the predictor names and the response column name predictors = iris.columns[:-1] response = 'class' # split into train and validation sets train, valid = iris.split_frame(ratios = [.8]) # try using the `link` parameter: # Initialize and train a GLM iris_glm = H2OGeneralizedLinearEstimator(family = 'multinomial', link = 'family_default') iris_glm.train(x = predictors, y = response, training_frame = train, validation_frame = valid) # print the logloss for the validation data iris_glm.logloss(valid = True)