transform
¶
- Available in: GLRM, PCA, Aggregator
- Hyperparameter: yes
Description¶
Use the transform
parameter to specify the transformation method used for the training data. Available options include:
- None: Do not perform any transformations on the data.
- Standardize: Standardizing subtracts the mean and then divides each variable by its standard deviation.
- Normalize: Scales all numeric variables in the range [0,1].
- Demean: The mean for each variable is subtracting from each observation resulting in mean zero. Note that it is not always advisable to demean the data if the Moving Average parameter is of primary interest to estimate.
- Descale: Divides by the standard deviation of each column.
In PCA and GLRM, this value defaults to None
.
In Aggregator, this value defaults to Normalize
.
Example¶
- r
- python
library(h2o)
h2o.init()
# Load the Birds dataset
birds.hex <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/birds.csv")
# Train using Standardized transform
birds.pca <- h2o.prcomp(training_frame = birds.hex, transform = "STANDARDIZE",
k = 3, pca_method="Power", use_all_factor_levels=TRUE,
impute_missing=TRUE)
# View the importance of components
birds.pca@model$importance
Importance of components:
pc1 pc2 pc3
Standard deviation 1.496991 1.351000 1.014182
Proportion of Variance 0.289987 0.236184 0.133098
Cumulative Proportion 0.289987 0.526171 0.659269
# View the eigenvectors
birds.pca@model$eigenvectors
Rotation:
pc1 pc2 pc3
patch.Ref1a 0.007207 0.007449 0.001161
patch.Ref1b -0.003090 0.011257 -0.001066
patch.Ref1c 0.002962 0.008850 -0.000264
patch.Ref1d -0.001295 0.011003 0.000501
patch.Ref1e 0.006559 0.006904 -0.001206
---
pc1 pc2 pc3
S 0.463591 -0.053410 0.184799
year -0.055934 0.009691 -0.968635
area 0.533375 -0.289381 -0.130338
log.area. 0.583966 -0.262287 -0.089582
ENN -0.270615 -0.573900 0.038835
log.ENN. -0.231368 -0.640231 0.026325
# Train again using Normalize transform
birds2.pca <- h2o.prcomp(training_frame = birds.hex, transform = "NORMALIZE",
k = 3, pca_method="Power", use_all_factor_levels=TRUE,
impute_missing=TRUE)
# View the importance of components
birds2.pca@model$importance
Importance of components:
pc1 pc2 pc3
Standard deviation 0.632015 0.531616 0.517096
Proportion of Variance 0.166444 0.117764 0.111418
Cumulative Proportion 0.166444 0.284208 0.395626
# View the eigenvectors
birds2.pca@model$eigenvectors
Rotation:
pc1 pc2 pc3
patch.Ref1a 0.026631 -0.006839 0.008674
patch.Ref1b 0.025825 -0.010199 0.004386
patch.Ref1c 0.026240 -0.008322 0.006759
patch.Ref1d 0.026106 -0.009375 0.005472
patch.Ref1e 0.026313 -0.007510 0.007769
---
pc1 pc2 pc3
S 0.055295 0.113531 0.141168
year -0.003343 -0.013812 -0.019785
area -0.011008 0.064146 0.087213
log.area. 0.007378 0.080143 0.086986
ENN -0.151652 -0.026572 -0.013064
log.ENN. -0.463210 -0.046953 0.086169