``impute_missing``
------------------

- Available in: PCA
- Hyperparameter: no

Description
~~~~~~~~~~~

In some cases, dataset used can contain a fewer number of rows due to the removal of rows with NA/missing values. If this is not the desired behavior, then you can use the ``impute_missing`` option to impute missing entries in each column with the column mean value. 

This value defaults to False.

Related Parameters
~~~~~~~~~~~~~~~~~~

- None

Example
~~~~~~~

.. example-code::
   .. code-block:: r

    library(h2o)
    h2o.init()

    # Load the Birds dataset
    birds.hex <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/birds.csv")

    # Train with impute_missing enabled
    birds.pca <- h2o.prcomp(training_frame = birds.hex, transform = "STANDARDIZE",
                            k = 3, pca_method="Power", use_all_factor_levels=TRUE, 
                            impute_missing=TRUE)

    # View the importance of components
    birds.pca@model$importance
    Importance of components: 
                                pc1      pc2      pc3
    Standard deviation     1.496991 1.351000 1.014182
    Proportion of Variance 0.289987 0.236184 0.133098
    Cumulative Proportion  0.289987 0.526171 0.659269

    # View the eigenvectors
    birds.pca@model$eigenvectors
    Rotation: 
                      pc1      pc2       pc3
    patch.Ref1a  0.007207 0.007449  0.001161
    patch.Ref1b -0.003090 0.011257 -0.001066
    patch.Ref1c  0.002962 0.008850 -0.000264
    patch.Ref1d -0.001295 0.011003  0.000501
    patch.Ref1e  0.006559 0.006904 -0.001206

    ---
                    pc1       pc2       pc3
    S          0.463591 -0.053410  0.184799
    year      -0.055934  0.009691 -0.968635
    area       0.533375 -0.289381 -0.130338
    log.area.  0.583966 -0.262287 -0.089582
    ENN       -0.270615 -0.573900  0.038835
    log.ENN.  -0.231368 -0.640231  0.026325

    # Train again without imputing missing values
    birds2.pca <- h2o.prcomp(training_frame = birds.hex, transform = "STANDARDIZE",
                             k = 3, pca_method="Power", use_all_factor_levels=TRUE, 
                             impute_missing=FALSE)

    Warning message:
    In doTryCatch(return(expr), name, parentenv, handler) :
      _train: Dataset used may contain fewer number of rows due to removal of rows 
      with NA/missing values. If this is not desirable, set impute_missing argument 
      in pca call to TRUE/True/true/... depending on the client language.

    # View the importance of components
    birds2.pca@model$importance
    Importance of components: 
                                pc1      pc2      pc3
    Standard deviation     1.546397 1.348276 1.055239
    Proportion of Variance 0.300269 0.228258 0.139820
    Cumulative Proportion  0.300269 0.528527 0.668347

    # View the eigenvectors
    birds2.pca@model$eigenvectors
    Rotation: 
                      pc1       pc2       pc3
    patch.Ref1a  0.009848 -0.005947 -0.001061
    patch.Ref1b -0.001628 -0.014739 -0.001007
    patch.Ref1c  0.004994 -0.009486 -0.000523
    patch.Ref1d  0.000117 -0.004400 -0.004917
    patch.Ref1e  0.003627 -0.001467 -0.004268

    ---
                    pc1       pc2       pc3
    S          0.515048  0.226915 -0.123136
    year      -0.066269 -0.069526  0.971250
    area       0.414050  0.344332  0.149339
    log.area.  0.497313  0.363609  0.131261
    ENN       -0.390235  0.545631 -0.007944
    log.ENN.  -0.345665  0.562834 -0.002092

   .. code-block:: python

    import(h2o)
    h2o.init()
    from h2o.estimators.pca import H2OPrincipalComponentAnalysisEstimator

    # Load the Birds dataset
    birds = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/birds.csv")

    # Train with impute_missing enabled
    birds.pca = H2OPrincipalComponentAnalysisEstimator(k = 3, transform = "STANDARDIZE", pca_method="Power", 
                       use_all_factor_levels=True, impute_missing=True)
    birds.pca.train(x=list(range(4)), training_frame=birds)

    # View the importance of components
    birds.pca.varimp(use_pandas=False)
    [(u'Standard deviation', 1.0505993078459912, 0.8950182545325247, 0.5587566783073901), 
    (u'Proportion of Variance', 0.28699613488673914, 0.20828865401845226, 0.08117966990084355), 
    (u'Cumulative Proportion', 0.28699613488673914, 0.4952847889051914, 0.5764644588060349)]

    # View the eigenvectors
    birds.pca.rotation()
    Rotation: 
                       pc1                 pc2                pc3
    -----------------  ------------------  -----------------  ----------------
    patch.Ref1a        0.00732398141913    -0.0141576160836   0.0294419461081
    patch.Ref1b        -0.00482860843905   0.00867426840498   0.0330778190153
    patch.Ref1c        0.00124768649004    -0.00274167383932  0.0312598825617
    patch.Ref1d        -0.000370181920761  0.000297923901103  0.0317439245635
    patch.Ref1e        0.00223394447742    -0.00459462277502  0.0309648089406
    ---                ---                 ---                ---
    landscape.Bauxite  -0.0638494513759    0.136728811833     0.118858152002
    landscape.Forest   0.0378085502606     -0.0833578672691   0.969316569884
    landscape.Urban    -0.0545759062856    0.111309410422     0.0354475756223
    S                  0.564501605704      -0.767095710638    -0.0466832766991
    year               -0.814596906726     -0.577331674836    -0.0101626722479

    See the whole table with table.as_data_frame()

    # Train again without imputing missing values
    birds2 = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/birds.csv")
    birds2.pca = H2OPrincipalComponentAnalysisEstimator(k = 3, transform = "STANDARDIZE", 
                        pca_method="Power", use_all_factor_levels=True, 
                        impute_missing=False)
    birds2.pca.train(x=list(range(4)), training_frame=birds2)

    # View the importance of components
    birds2.pca.varimp(use_pandas=False)
    [(u'Standard deviation', 1.1238486420242524, 0.949554306091356, 0.534896629598228), 
    (u'Proportion of Variance', 0.3080623966646966, 0.21991895069672512, 0.06978510918460899), 
    (u'Cumulative Proportion', 0.3080623966646966, 0.5279813473614217, 0.5977664565460307)]

    # View the eigenvectors
    birds2.pca.rotation()
    Rotation: 
                       pc1                pc2                pc3
    -----------------  -----------------  -----------------  -----------------
    patch.Ref1a        0.00898674970716   0.0133755203176    0.0386887315027
    patch.Ref1b        -0.00583910665399  -0.00850852817775  0.0403921679996
    patch.Ref1c        0.00157382152659   0.00243349606991   0.0395404497512
    patch.Ref1d        0.00205431391489   -0.00464763108225  0.0130225730145
    patch.Ref1e        0.00521157104675   9.98792622547e-07  0.0126676559841
    ---                ---                ---                ---
    landscape.Bauxite  -0.0927064158093   -0.0985077050027   0.312254932996
    landscape.Forest   0.049803344754     0.0606680349608    0.928822693132
    landscape.Urban    -0.0671561320808   -0.108679950396    0.033639706807
    S                  0.661206203315     0.69412159594      -0.0166591571667
    year               -0.727793152951    0.684904477663     -0.00409291536614

    See the whole table with table.as_data_frame()