laplace
¶
- Available in: Naïve-Bayes
- Hyperparameter: yes
Description¶
This option specifies a value for the Laplace smoothing factor, which sets the conditional probability of a predictor. If the Laplace smoothing parameter is disabled (laplace = 0
), then Naive Bayes will predict a probability of 0 for any row in the test set that contains a previously unseen categorical level. However, if the Laplace smoothing parameter is used (e.g. laplace = 1
), then the model can make predictions for rows that include previously unseen categorical level.
Laplace smoothing adjusts the maximum likelihood estimates by adding 1 to the numerator and k to the denominator to allow for new categorical levels in the training set:
ϕj|y=1=Σmi=11(x(i)j = 1 ⋂y(i) = 1) + 1Σmi=11(y(i) = 1) + k
ϕj|y=0=Σmi=11(x(i)j = 1 ⋂y(i) = 0) + 1Σmi = 11(y(i) = 0) + k
x(i) represents features, y(i) represents the response column, and k represents the addition of each new categorical level. (k functions to balance the added 1 in the numerator.)
Laplace smoothing should be used with care; it is generally intended to allow for predictions in rare events. As prediction data becomes increasingly distinct from training data, new models should be trained when possible to account for a broader set of possible feature values.
This value must be >=0 and defaults to 0.
Example¶
- r
- python
library(h2o)
h2o.init()
# import the cars dataset:
prostate <- h2o.importFile("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip")
# Converting CAPSULE, RACE, DCAPS, and DPROS to categorical
prostate$CAPSULE <- as.factor(prostate$CAPSULE)
prostate$RACE <- as.factor(prostate$RACE)
prostate$DCAPS <- as.factor(prostate$DCAPS)
prostate$DPROS <- as.factor(prostate$DPROS)
# Compare with Naive Bayes when x = 3:9, y = 2, and use laplace smoothing
prostate.nb <- h2o.naiveBayes(x = 3:9, y = 2, training_frame = prostate, laplace = 1)
print(prostate.nb)
# Predict on training data
prostate.pred <- predict(prostate.nb, prostate)
print(head(prostate.pred))