R - **Which versions of R are compatible with H2O?** Currently, the only version of R that is known to not work well with H2O is R version 3.1.0 (codename "Spring Dance"). If you are using this version, we recommend upgrading the R version before using H2O. -------------- **What R packages are required to use H2O?** The following packages are required: - ``methods`` - ``statmod`` - ``stats`` - ``graphics`` - ``RCurl`` - ``jsonlite`` - ``tools`` - ``utils`` Some of these packages have dependencies; for example, ``bitops`` is required, but it is a dependency of the ``RCurl`` package, so ``bitops`` is automatically included when ``RCurl`` is installed. If you are encountering errors related to missing R packages when using H2O, refer to the following list for a complete list of all R packages, including dependencies: - ``statmod`` - ``bitops`` - ``RCurl`` - ``jsonlite`` - ``methods`` - ``stats`` - ``graphics`` - ``tools`` - ``utils`` - ``stringi`` - ``magrittr`` - ``colorspace`` - ``stringr`` - ``RColorBrewer`` - ``dichromat`` - ``munsell`` - ``labeling`` - ``plyr`` - ``digest`` - ``gtable`` - ``reshape2`` - ``scales`` - ``proto`` - ``ggplot2`` - ``h2oEnsemble`` - ``gtools`` - ``gdata`` - ``caTools`` - ``gplots`` - ``chron`` - ``ROCR`` - ``data.table`` - ``cvAUC`` Finally, if you are running R on Linux, then you must install ``libcurl``, which allows H2O to communicate with R. -------------- **How can I install the H2O R package if I am having permissions problems?** This issue typically occurs for Linux users when the R software was installed by a root user. For more information, refer to the following `link `__. To specify the installation location for the R packages, create a file that contains the ``R_LIBS_USER`` environment variable: ``echo R_LIBS_USER=\"~/.Rlibrary\" > ~/.Renviron`` Confirm the file was created successfully using ``cat``: ``$ cat ~/.Renviron`` You should see the following output: ``R_LIBS_USER="~/.Rlibrary"`` Create a new directory for the environment variable: ``$ mkdir ~/.Rlibrary`` Start R and enter the following: ``.libPaths()`` Look for the following output to confirm the changes: :: [1] "/.Rlibrary" [2] "/Library/Frameworks/R.framework/Versions/3.1/Resources/library" -------------- **I received the following error message after launching H2O in RStudio and using ``h2o.init`` - what should I do to resolve this error?** :: Error in h2o.init() : Version mismatch! H2O is running version 3.2.0.9 but R package is version 3.2.0.3 This error is due to a version mismatch between the H2O R package and the running H2O instance. Make sure you are using the latest version of both files by downloading H2O from the `downloads page `__ and installing the latest version and that you have removed any previous H2O R package versions by running: :: if ("package:h2o" %in% search()) { detach("package:h2o", unload=TRUE) } if ("h2o" %in% rownames(installed.packages())) { remove.packages("h2o") } Make sure to install the dependencies for the H2O R package as well: :: if (! ("methods" %in% rownames(installed.packages()))) { install.packages("methods") } if (! ("statmod" %in% rownames(installed.packages()))) { install.packages("statmod") } if (! ("stats" %in% rownames(installed.packages()))) { install.packages("stats") } if (! ("graphics" %in% rownames(installed.packages()))) { install.packages("graphics") } if (! ("RCurl" %in% rownames(installed.packages()))) { install.packages("RCurl") } if (! ("jsonlite" %in% rownames(installed.packages()))) { install.packages("jsonlite") } if (! ("tools" %in% rownames(installed.packages()))) { install.packages("tools") } if (! ("utils" %in% rownames(installed.packages()))) { install.packages("utils") } Finally, install the latest stable version of the H2O package for R: :: install.packages("h2o", type="source", repos=(c("http://h2o-release.s3.amazonaws.com/h2o/latest_stable_R))) library(h2o) localH2O = h2o.init() If your R version is older than the H2O R package, upgrade your R version using ``update.packages(checkBuilt=TRUE, ask=FALSE)``. -------------- **I received the following error message after launching H2O in RStudio and using ``h2o.init`` - what should I do to resolve this error?** :: Server error - server 127.0.0.1 is unreachable at this moment. Please retry the request or contact your administrator. This error occurs when the proxy is set in your R environment. The resolution is to unset that so that you can access localhost from within R. Run the following to unset the proxy: :: Sys.unsetenv("http_proxy") Sys.unsetenv("https_proxy") Sys.unsetenv("http_proxy_user") Sys.unsetenv("https_proxy_user") -------------- **I received the following error message after trying to run some code - what should I do?** :: > fit <- h2o.deeplearning(x=2:4, y=1, training_frame=train_hex) |=========================================================================================================| 100% Error in model$training_metrics$MSE : $ operator not defined for this S4 class In addition: Warning message: Not all shim outputs are fully supported, please see ?h2o.shim for more information Remove the ``h2o.shim(enable=TRUE)`` line and try running the code again. Note that the ``h2o.shim`` is only a way to notify users of previous versions of H2O about changes to the H2O R package - it will not revise your code, but provides suggested replacements for deprecated commands and parameters. -------------- **How do I extract the model weights from a model I've creating using H2O in R? I've enabled ``extract_model_weights_and_biases``, but the output refers to a file I can't open in R.** For an example of how to extract weights and biases from a model, refer to the following repo location on `GitHub `__. -------------- **How do I extract the run time of my model as output?** For the following example: :: out.h2o.rf = h2o.randomForest( x=c("x1", "x2", "x3", "w"), y="y", training_frame=h2o.df.train, seed=555, model_id= "my.model.1st.try.out.h2o.rf" ) Use ``out.h2o.rf@model$run_time`` to determine the value of the ``run_time`` variable. -------------- **What is the best way to do group summarizations? For example, getting sums of specific columns grouped by a categorical column.** We strongly recommend using ``h2o.group_by`` for this function instead of ``h2o.ddply``, as shown in the following example: :: newframe <- h2o.group_by(h2oframe, by="footwear_category", nrow("email_event_click_ct"), sum("email_event_click_ct"), mean("email_event_click_ct"), sd("email_event_click_ct"), gb.control = list( col.names=c("count", "total_email_event_click_ct", "avg_email_event_click_ct", "std_email_event_click_ct") ) ) Using ``gb.control`` is optional; here it is included so the column names are user-configurable. The ``by`` option can take a list of columns if you want to group by more than one column to compute the summary as shown in the following example: :: newframe <- h2o.group_by(h2oframe, by=c("footwear_category","age_group"), nrow("email_event_click_ct"), sum("email_event_click_ct"), mean("email_event_click_ct"), sd("email_event_click_ct"), gb.control = list( col.names=c("count", "total_email_event_click_ct", "avg_email_event_click_ct", "std_email_event_click_ct") ) ) -------------- **I'm using Linux and I want to run H2O in R - are there any dependencies I need to install?** Yes, make sure to install ``libcurl``, which allows H2O to communicate with R. We also recommend disabling SElinux and any firewalls, at least initially until you have confirmed H2O can initialize. - On Ubuntu, run: ``apt-get install libcurl4-openssl-dev`` - On CentOS, run: ``yum install libcurl-devel`` -------------- **How do I change variable/header names on an H2O frame in R?** There are two ways to change header names. To specify the headers during parsing, import the headers in R and then specify the header as the column name when the actual data frame is imported: :: header <- h2o.importFile(path = pathToHeader) data <- h2o.importFile(path = pathToData, col.names = header) data You can also use the ``names()`` function: :: header <- c("user", "specified", "column", "names") data <- h2o.importFile(path = pathToData) names(data) <- header To replace specific column names, you can also use a ``sub/gsub`` in R: :: header <- c("user", "specified", "column", "names") ## I want to replace "user" column with "computer" data <- h2o.importFile(path = pathToData) names(data) <- sub(pattern = "user", replacement = "computer", x = names(header)) -------------- **My R terminal crashed - how can I re-access my H2O frame?** Launch H2O and use your web browser to access the web UI, Flow, at ``localhost:54321``. Click the **Data** menu, then click **List All Frames**. Copy the frame ID, then run ``h2o.ls()`` in R to list all the frames, or use the frame ID in the following code (replacing ``YOUR_FRAME_ID`` with the frame ID): :: library(h2o) localH2O = h2o.init(ip="sri.h2o.ai", port=54321, startH2O = F, strict_version_check=T) data_frame <- h2o.getFrame(frame_id = "YOUR_FRAME_ID") -------------- **How do I remove rows containing NAs in an H2OFrame?** To remove NAs from rows: :: a b c d e 1 0 NA NA NA NA 2 0 2 2 2 2 3 0 NA NA NA NA 4 0 NA NA 1 2 5 0 NA NA NA NA 6 0 1 2 3 2 Removing rows 1, 3, 4, 5 to get: :: a b c d e 2 0 2 2 2 2 6 0 1 2 3 2 Use ``na.omit(myFrame)``, where ``myFrame`` represents the name of the frame you are editing. -------------- **I installed H2O in R using OS X and updated all the dependencies, but the following error message displayed: ``Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, Unexpected CURL error: Empty reply from server`` - what should I do?** This error message displays if the ``JAVA_HOME`` environment variable is not set correctly. The ``JAVA_HOME`` variable is likely points to Apple Java version 6 instead of Oracle Java version 8. If you are running OS X 10.7 or earlier, enter the following in Terminal: ``export JAVA_HOME=/Library/Internet\ Plug-Ins/JavaAppletPlugin.plugin/Contents/Home`` If you are running OS X 10.8 or later, modify the launchd.plist by entering the following in Terminal: :: cat << EOF | sudo tee /Library/LaunchDaemons/setenv.JAVA_HOME.plist Label setenv.JAVA_HOME ProgramArguments /bin/launchctl setenv JAVA_HOME /Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home RunAtLoad ServiceIPC EOF -------------- .. raw:: html -------------- **How does the ``col.names`` argument work in ``group_by``?** You need to add the ``col.names`` inside the ``gb.control`` list. Refer to the following example: :: newframe <- h2o.group_by(dd, by="footwear_category", nrow("email_event_click_ct"), sum("email_event_click_ct"), mean("email_event_click_ct"), sd("email_event_click_ct"), gb.control = list( col.names=c("count", "total_email_event_click_ct", "avg_email_event_click_ct", "std_email_event_click_ct") ) ) newframe$avg_email_event_click_ct2 = newframe$total_email_event_click_ct / newframe$count -------------- **How are the results of ``h2o.predict`` displayed?** The order of the rows in the results for ``h2o.predict`` is the same as the order in which the data was loaded, even if some rows fail (for example, due to missing values or unseen factor levels). To bind a per-row identifier, use ``cbind``. -------------- **How do I view all the variable importances for a model?** By default, H2O returns the top five and lowest five variable importances. To view all the variable importances, use the following: :: model <- h2o.getModel(model_id = "my_H2O_modelID",conn=localH2O) varimp<-as.data.frame(h2o.varimp(model)) -------------- **How do I add random noise to a column in an H2O frame?** To add random noise to a column in an H2O frame, refer to the following example: :: h2o.init() fr <- as.h2o(iris) |======================================================================| 100% random_column <- h2o.runif(fr) new_fr <- h2o.cbind(fr,random_column) new_fr