`link`¶

Available in: GLM
Hyperparameter: no

Description¶

GLM problems consist of three main components:

A random component $f$ for the dependent variable $y$ : The density function $f(y;\theta,\phi)$ has a probability distribution from the exponential family parametrized by $\theta$ and $\phi$ . This removes the restriction on the distribution of the error and allows for non-homogeneity of the variance with respect to the mean vector.
A systematic component (linear model) $\eta$ : $\eta = X\beta$ , where $X$ is the matrix of all observation vectors $x_i$ .
A link function $g$ : $E(y) = \mu = {g^-1}(\eta)$ relates the expected value of the response $\mu$ to the linear component $\eta$ . The link function can be any monotonic differentiable function. This relaxes the constraints on the additivity of the covariates, and it allows the response to belong to a restricted range of values depending on the chosen transformation $g$ .

Accordingly, in order to specify a GLM problem, you must choose a family function $f$ , link function $g$ , and any parameters needed to train the model.

H2O’s GLM supports the following link functions: Family_Default, Identity, Logit, Log, Inverse, and Tweedie.

The following table describes the allowed Family/Link combinations.

Family	Link Function
	Family_Default	Identity	Logit	Log	Inverse	Tweedie
Binomial	X		X
Quasibinomial	X		X
Multinomial	X
Gaussian	X	X		X	X
Poisson	X	X		X
Gamma	X	X		X	X
Tweedie	X					X

Refer to the Links section for more information.

Example¶

r
python

library(h2o)
h2o.init()

# import the iris dataset:
# this dataset is used to classify the type of iris plant
# the original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Iris
iris <- h2o.importFile("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv")

# convert response column to a factor
iris['class'] <- as.factor(iris['class'])

# set the predictor names and the response column name
predictors <- colnames(iris)[-length(iris)]
response <- 'class'

# split into train and validation
iris.splits <- h2o.splitFrame(data = iris, ratios = .8)
train <- iris.splits[[1]]
valid <- iris.splits[[2]]

# try using the `link` parameter:
iris_glm <- h2o.glm(x = predictors, y = response, family = 'multinomial', link = 'family_default',
                   training_frame = train, validation_frame = valid)

# print the logloss for the validation data
print(h2o.logloss(iris_glm, valid = TRUE))

link¶

Description¶

Related Parameters¶

Example¶

`link`¶