pca_method
¶
- Available in: PCA
- Hyperparameter: no
Description¶
Use the pca_method
parameter to specify the algorithm to use for computing the principal components. Available options include:
- GramSVD: Uses a distributed computation of the Gram matrix, followed by a local SVD using the JAMA package
- Power: Computes the SVD using the power iteration method (experimental)
- Randomized: Uses randomized subspace iteration method
- GLRM: Fits a generalized low-rank model with L2 loss function and no regularization and solves for the SVD using local matrix algebra (experimental)
Note: For pca_method = Randomized
, the algorithm must deal with matrices of size m by k and n by k, where
- m is number of rows,
- n is expanded column size and
- k is the number of eigenvectors desired.
As a result, there is no advantage to be gained by trying to find the eigenvectors of the matrix transpose. In other words, when using PCA with wide datasets, users should not choose Randomize method.
Example¶
- r
- python
library(h2o)
h2o.init()
# Load the Birds dataset
birds.hex <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/birds.csv")
# Train using the Power pca_method
birds.pca <- h2o.prcomp(training_frame = birds.hex, transform = "STANDARDIZE",
k = 3, pca_method="Power", use_all_factor_levels=TRUE,
impute_missing=TRUE)
# View the importance of components
birds.pca@model$importance
Importance of components:
pc1 pc2 pc3
Standard deviation 1.496991 1.351000 1.014182
Proportion of Variance 0.289987 0.236184 0.133098
Cumulative Proportion 0.289987 0.526171 0.659269
# View the eigenvectors
birds.pca@model$eigenvectors
Rotation:
pc1 pc2 pc3
patch.Ref1a 0.007207 0.007449 0.001161
patch.Ref1b -0.003090 0.011257 -0.001066
patch.Ref1c 0.002962 0.008850 -0.000264
patch.Ref1d -0.001295 0.011003 0.000501
patch.Ref1e 0.006559 0.006904 -0.001206
---
pc1 pc2 pc3
S 0.463591 -0.053410 0.184799
year -0.055934 0.009691 -0.968635
area 0.533375 -0.289381 -0.130338
log.area. 0.583966 -0.262287 -0.089582
ENN -0.270615 -0.573900 0.038835
log.ENN. -0.231368 -0.640231 0.026325
# Train again using GLRM pca_method
birds2.pca <- h2o.prcomp(training_frame = birds.hex, transform = "STANDARDIZE",
k = 3, pca_method="GLRM", use_all_factor_levels=TRUE,
impute_missing=TRUE)
# View the importance of components
birds2.pca@model$importance
Importance of components:
pc1 pc2 pc3
Standard deviation 2.659459 0.700971 0.404706
Proportion of Variance 0.915223 0.063583 0.021194
Cumulative Proportion 0.915223 0.978806 1.000000
# View the eigenvectors
birds2.pca@model$eigenvectors
Rotation:
pc1 pc2 pc3
patch.Ref1a -0.092008 0.030110 -0.018916
patch.Ref1b -0.107461 0.040519 0.076546
patch.Ref1c -0.103785 0.059700 0.016164
patch.Ref1d -0.105764 0.044823 0.062234
patch.Ref1e -0.102115 0.058994 -0.037536
---
pc1 pc2 pc3
S 0.003558 0.111264 -0.422437
year 0.000008 -0.004418 0.032813
area 0.004551 0.049496 -0.444745
log.area. 0.002756 0.066183 -0.453866
ENN 0.013259 -0.274711 -0.053960
log.ENN. 0.009517 -0.282830 -0.107461