Performs a group by and apply similar to ddply.
h2o.group_by( data, by, ..., gb.control = list(na.methods = NULL, col.names = NULL) )
| data | an H2OFrame object. | 
|---|---|
| by | a list of column names | 
| ... | any supported aggregate function. See  | 
| gb.control | a list of how to handle  | 
Returns a new H2OFrame object with columns equivalent to the number of groups created
In the case of na.methods within gb.control, there are three possible settings.
"all" will include NAs in computation of functions. "rm" will completely
remove all NA fields. "ignore" will remove NAs from the numerator but keep
the rows for computational purposes. If a list smaller than the number of columns groups is
supplied, the list will be padded by "ignore".
Note that to specify a list of column names in the gb.control list, you must add the 
col.names argument. Similar to na.methods, col.names will pad the list with 
the default column names if the length is less than the number of colums groups supplied.
Supported functions include nrow. This function is required and accepts a string for the 
name of the generated column. Other supported aggregate functions accept col and na 
arguments for specifying columns and the handling of NAs ("all", "ignore", and 
GroupBy object; max calculates the maximum of each column specified in col for each 
group of a GroupBy object; mean calculates the mean of each column specified in col 
for each group of a GroupBy object; min calculates the minimum of each column specified in 
col for each group of a GroupBy object; mode calculates the mode of each column 
specified in col for each group of a GroupBy object; sd calculates the standard 
deviation of each column specified in col for each group of a GroupBy object; ss 
calculates the sum of squares of each column specified in col for each group of a GroupBy 
object; sum calculates the sum of each column specified in col for each group of a 
GroupBy object; and var calculates the variance of each column specified in col for 
each group of a GroupBy object. If an aggregate is provided without a value (for example, as 
max in sum(col="X1", na="all").mean(col="X5", na="all").max()), then it is assumed 
that the aggregation should apply to all columns except the GroupBy columns. However, operations
 will not be performed on String columns.  They will be skipped.  Note again that
nrow is required and cannot be empty.
if (FALSE) { library(h2o) h2o.init() df <- h2o.importFile(paste("https://s3.amazonaws.com/h2o-public-test-data", "/smalldata/prostate/prostate.csv", sep="")) h2o.group_by(data = df, by = "RACE", nrow("VOL")) }