impute_missing
¶
- Available in: PCA
- Hyperparameter: no
Description¶
In some cases, dataset used can contain a fewer number of rows due to the removal of rows with NA/missing values. If this is not the desired behavior, then you can use the impute_missing
option to impute missing entries in each column with the column mean value.
This value defaults to False.
Example¶
- r
- python
library(h2o)
h2o.init()
# Load the Birds dataset
birds.hex <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/birds.csv")
# Train with impute_missing enabled
birds.pca <- h2o.prcomp(training_frame = birds.hex, transform = "STANDARDIZE",
k = 3, pca_method="Power", use_all_factor_levels=TRUE,
impute_missing=TRUE)
# View the importance of components
birds.pca@model$importance
Importance of components:
pc1 pc2 pc3
Standard deviation 1.496991 1.351000 1.014182
Proportion of Variance 0.289987 0.236184 0.133098
Cumulative Proportion 0.289987 0.526171 0.659269
# View the eigenvectors
birds.pca@model$eigenvectors
Rotation:
pc1 pc2 pc3
patch.Ref1a 0.007207 0.007449 0.001161
patch.Ref1b -0.003090 0.011257 -0.001066
patch.Ref1c 0.002962 0.008850 -0.000264
patch.Ref1d -0.001295 0.011003 0.000501
patch.Ref1e 0.006559 0.006904 -0.001206
---
pc1 pc2 pc3
S 0.463591 -0.053410 0.184799
year -0.055934 0.009691 -0.968635
area 0.533375 -0.289381 -0.130338
log.area. 0.583966 -0.262287 -0.089582
ENN -0.270615 -0.573900 0.038835
log.ENN. -0.231368 -0.640231 0.026325
# Train again without imputing missing values
birds2.pca <- h2o.prcomp(training_frame = birds.hex, transform = "STANDARDIZE",
k = 3, pca_method="Power", use_all_factor_levels=TRUE,
impute_missing=FALSE)
Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
_train: Dataset used may contain fewer number of rows due to removal of rows
with NA/missing values. If this is not desirable, set impute_missing argument
in pca call to TRUE/True/true/... depending on the client language.
# View the importance of components
birds2.pca@model$importance
Importance of components:
pc1 pc2 pc3
Standard deviation 1.546397 1.348276 1.055239
Proportion of Variance 0.300269 0.228258 0.139820
Cumulative Proportion 0.300269 0.528527 0.668347
# View the eigenvectors
birds2.pca@model$eigenvectors
Rotation:
pc1 pc2 pc3
patch.Ref1a 0.009848 -0.005947 -0.001061
patch.Ref1b -0.001628 -0.014739 -0.001007
patch.Ref1c 0.004994 -0.009486 -0.000523
patch.Ref1d 0.000117 -0.004400 -0.004917
patch.Ref1e 0.003627 -0.001467 -0.004268
---
pc1 pc2 pc3
S 0.515048 0.226915 -0.123136
year -0.066269 -0.069526 0.971250
area 0.414050 0.344332 0.149339
log.area. 0.497313 0.363609 0.131261
ENN -0.390235 0.545631 -0.007944
log.ENN. -0.345665 0.562834 -0.002092