Merges two H2OFrame objects with the same arguments and meanings as merge() in base R. However, we do not support all=TRUE, all.x=TRUE and all.y=TRUE. The default method is auto and it will default to the radix method. The radix method will return the correct merge result regardless of duplicated rows in the right frame. In addition, the radix method can perform merge even if you have string columns in your frames. If there are duplicated rows in your rite frame, they will not be included if you use the hash method. The hash method cannot perform merge if you have string columns in your left frame. Hence, we consider the radix method superior to the hash method and is the default method to use.
h2o.merge(x, y, by = intersect(names(x), names(y)), by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all, method = "auto")
x, y | H2OFrame objects |
---|---|
by | columns used for merging by default the common names |
by.x | x columns used for merging by name or number |
by.y | y columns used for merging by name or number |
all | TRUE includes all rows in x and all rows in y even if there is no match to the other |
all.x | If all.x is true, all rows in the x will be included, even if there is no matching row in y, and vice-versa for all.y. |
all.y | see all.x |
method | auto(default), radix, hash |
# NOT RUN { h2o.init() left <- data.frame(fruit = c('apple', 'orange', 'banana', 'lemon', 'strawberry', 'blueberry'), color <- c('red', 'orange', 'yellow', 'yellow', 'red', 'blue')) right <- data.frame(fruit = c('apple', 'orange', 'banana', 'lemon', 'strawberry', 'watermelon'), citrus <- c(FALSE, TRUE, FALSE, TRUE, FALSE, FALSE)) left_hf <- as.h2o(left) right_hf <- as.h2o(right) merged <- h2o.merge(left_hf, right_hf, all.x = TRUE) # }