``lambda_min_ratio`` -------------------- - Available in: GLM - Hyperparameter: no Description ~~~~~~~~~~~ This option is used to specify the minimum value of lambda to use for lambda search (specified as a ratio of lambda_max) ``lambda_min_ratio`` works in conjunction with ``nlambdas``. The sequence of the :math:`\lambda` values is automatically generated as an exponentially decreasing sequence. It ranges from :math:`\lambda_{max}` (the smallest :math:`\lambda` so that the solution is a model with all 0s) to :math:`\lambda_{min} =` ``lambda_min_ratio`` :math:`\times` :math:`\lambda_{max}`. H2O computes :math:`\lambda` models sequentially and in decreasing order, warm-starting the model (using the previous solutin as the initial prediction) for :math:`\lambda_k` with the solution for :math:`\lambda_{k-1}`. By warm-starting the models, we get better performance. Typically models for subsequent :math:`\lambda` values are close to each other, so only a few iterations per :math:`\lambda` are needed (two or three). This also achieves greater numerical stability because models with a higher penalty are easier to compute. This method starts with an easy problem and then continues to make small adjustments. **Note**: ``lambda_min_ratio`` and ``nlambdas`` also specify the relative distance of any two lambdas in the sequence. This is important when applying recursive strong rules, which are only effective if the neighboring lambdas are "close" to each other. The default for ``lambda_min_ratio`` depends on the dataset (the number of rows/number of columns ratio). The default is 1e-4 if the number of rows > than the number of columns; otherwise, the default is 1e-2 if number of rows is <= the number of columns. Related Parameters ~~~~~~~~~~~~~~~~~~ - `lambda `__ - `lambda_search `__ - `nlambdas `__ Example ~~~~~~~ .. example-code:: .. code-block:: r library(h2o) h2o.init() # import the boston dataset: # this dataset looks at features of the boston suburbs and predicts median housing prices # the original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Housing boston <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") # set the predictor names and the response column name predictors <- colnames(boston)[1:13] # set the response column to "medv", the median value of owner-occupied homes in $1000's response <- "medv" # convert the chas column to a factor (chas = Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)) boston["chas"] <- as.factor(boston["chas"]) # split into train and validation sets boston.splits <- h2o.splitFrame(data = boston, ratios = .8) train <- boston.splits[[1]] valid <- boston.splits[[2]] # try using the `lambda_min_ratio` parameter: # train your model, where you specify the lambda_min_ratio boston_glm <- h2o.glm(x = predictors, y = response, training_frame = train, validation_frame = valid, lambda_min_ratio = .0001) # print the mse for the validation data print(h2o.mse(boston_glm, valid=TRUE)) .. code-block:: python import h2o from h2o.estimators.glm import H2OGeneralizedLinearEstimator h2o.init() # import the boston dataset: # this dataset looks at features of the boston suburbs and predicts median housing prices # the original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Housing boston = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/BostonHousing.csv") # set the predictor names and the response column name predictors = boston.columns[:-1] # set the response column to "medv", the median value of owner-occupied homes in $1000's response = "medv" # convert the chas column to a factor (chas = Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)) boston['chas'] = boston['chas'].asfactor() # split into train and validation sets train, valid = boston.split_frame(ratios = [.8]) # try using the `lambda_min_ratio` parameter: # initialize the estimator then train the model boston_glm = H2OGeneralizedLinearEstimator(lambda_min_ratio = .0001) boston_glm.train(x = predictors, y = response, training_frame = train, validation_frame = valid) # print the mse for the validation data print(boston_glm.mse(valid=True))