solver

  • Available in: GLM
  • Hyperparameter: no

Description

The solver option allows you to specify the solver method to use in GLM. When specifying a solver, the optimal solver depends on the data properties and prior information regarding the variables (if available). In general, the data are considered sparse if the ratio of zeros to non-zeros in the input matrix is greater than 10. The solution is sparse when only a subset of the original set of variables is intended to be kept in the model. In a dense solution, all predictors have non-zero coefficients in the final model.

In GLM, you can specify one of the following solvers:

  • IRLSM: Iteratively Reweighted Least Squares Method
  • L_BFGS: Limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm
  • COORDINATE_DESCENT: Coordinate Decent
  • COORDINATE_DESCENT_NAIVE: Coordinate Decent Naive
  • AUTO: Sets the solver based on given data and parameters (default)
  • GRADIENT_DESCENT_LH: Gradient Descent Likelihood (available for Ordinal family only; default for Ordinal family)
  • GRADIENT_DESCENT_SQERR: Gradient Descent Squared Error (available for Ordinal family only)

Detailed information about each of these options is available in the Solvers section. The bullets below describe GLM chooses the solver when solver=AUTO:

  • If there are more than 5k active predictors, GLM uses L_BFGS.
  • If family=multinomial and alpha=0 (ridge or no penalty), GLM uses L_BFGS.
  • If lambda search is enabled, GLM uses COORDINATE_DESCENT.
  • If your data has upper/lower bounds and no proximal penlaty, GLM uses COORDINATE_DESCENT.
  • If none above is true, then GLM defaults to IRLSM. This is because COORDINATE_DESCENT works much better with lambda search.

Below are some general guidelines to follow when specifying a solver.

  • L_BFGS works much better for L2-only multininomial and if you have too many active predictors.
  • You must use IRLSM if you have p-values.
  • IRLSM and COORDINATE_DESCENT share the same path (i.e., they both compute the same gram matrix), they just solve it differently.
  • Use COORDINATE_DESCENT if you have less than 5000 predictors and L1 penalty.
  • COORDINATE_DESCENT performs better when lambda_search is enabled. Also with bounds, it tends to get a higher accuracy.
  • Use GRADIENT_DESCENT_LH or GRADIENT_DESCENT_SQERR when family=ordinal. With GRADIENT_DESCENT_LH, the model parameters are adjusted by minimizing the loss function; with GRADIENT_DESCENT_SQERR, the model parameters are adjusted using the loss function.

Example

library(h2o)
h2o.init()
# import the boston dataset:
# this dataset looks at features of the boston suburbs and predicts median housing prices
# the original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Housing
boston <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv")

# set the predictor names and the response column name
predictors <- colnames(boston)[1:13]
# set the response column to "medv", the median value of owner-occupied homes in $1000's
response <- "medv"

# convert the chas column to a factor (chas = Charles River dummy variable (= 1 if tract bounds river; 0 otherwise))
boston["chas"] <- as.factor(boston["chas"])

# split into train and validation sets
boston.splits <- h2o.splitFrame(data =  boston, ratios = .8)
train <- boston.splits[[1]]
valid <- boston.splits[[2]]

# try using the `solver` parameter:
boston_glm <- h2o.glm(x = predictors, y = response, training_frame = train,
                      validation_frame = valid,
                      solver = 'IRLSM')

# print the mse for the validation data
print(h2o.mse(boston_glm, valid=TRUE))
import h2o
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
h2o.init()

# import the boston dataset:
# this dataset looks at features of the boston suburbs and predicts median housing prices
# the original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Housing
boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv")

# set the predictor names and the response column name
predictors = boston.columns[:-1]
# set the response column to "medv", the median value of owner-occupied homes in $1000's
response = "medv"

# convert the chas column to a factor (chas = Charles River dummy variable (= 1 if tract bounds river; 0 otherwise))
boston['chas'] = boston['chas'].asfactor()

# split into train and validation sets
train, valid = boston.split_frame(ratios = [.8])

# try using the `solver` parameter:
# initialize the estimator then train the model
boston_glm = H2OGeneralizedLinearEstimator(solver = 'irlsm')
boston_glm.train(x = predictors, y = response, training_frame = train, validation_frame = valid)

# print the mse for the validation data
print(boston_glm.mse(valid=True))