Combining Rows from Two Datasets¶
You can use the rbind
function to combine two similar datasets into a single large dataset. This can be used, for example, to create a larger dataset by combining data from a validation dataset with its training or testing dataset.
Note that when using rbind
, the two datasets must have the same set of columns.
> library(h2o)
> h2o.init()
# Import an existing training dataset
> ecg1Path <- "http://h2o-public-test-data.s3.amazonaws.com/smalldata/anomaly/ecg_discord_train.csv"
> ecg1.hex <- h2o.importFile(path=ecg1Path, destination_frame="ecg1.hex")
> print(dim(ecg1.hex))
[1] 20 210
# Import an existing testing dataset
> ecg2Path <- "http://h2o-public-test-data.s3.amazonaws.com/smalldata/anomaly/ecg_discord_test.csv"
> ecg2.hex <- h2o.importFile(path=ecg2Path, destination_frame="ecg2.hex")
> print(dim(ecg2.hex))
[1] 23 210
# Combine the two datasets into a single, larger dataset
> ecgCombine.hex <- h2o.rbind(ecg1.hex, ecg2.hex)
> print(dim(ecgCombine.hex))
[1] 43 210
>>> import h2o
>>> import numpy as np
>>> h2o.init()
# Generate a random dataset with 100 rows 4 columns. Label the columns A, B, C, and D.
>>> df1 = h2o.H2OFrame.from_python(np.random.randn(100,4).tolist(), column_names=list('ABCD'))
>>> df1.describe
A B C D
--------- ---------- --------- ----------
0.412228 -0.991376 -1.44374 -0.276455
0.348039 -0.193704 -0.370882 0.162211
0.125303 -1.24546 -0.916738 1.08088
0.293062 0.516151 0.739798 -0.430679
-0.363344 0.0558051 -1.43888 1.13882
-1.17492 -0.332647 -1.18689 0.533313
0.154774 1.46559 0.373058 -0.915895
0.555835 -0.0891554 -1.19151 0.623667
-1.13092 0.843549 -0.532341 -0.0739869
0.752855 -0.168504 -0.750161 -2.46084
[100 rows x 4 columns]
# Generate a second random dataset with 100 rows and 4 columns. Again, label the columns, A, B, C, and D.
>>> df2 = h2o.H2OFrame.from_python(np.random.randn(100,4).tolist(), column_names=list('ABCD'))
>>> df2.describe
A B C D
----------- --------- --------- ---------
0.00118227 -0.835817 1.06634 1.81794
-0.542678 -0.494483 0.109813 0.714271
-0.365611 -0.679095 0.891982 -1.93362
-0.0533568 0.86035 -2.28902 -1.287
-0.572775 1.30954 0.27412 -0.287373
0.310976 -0.594283 -0.566955 0.221888
1.34778 -1.02348 0.243686 0.319585
0.383136 -0.113979 -0.901779 -0.383478
-0.968212 -0.606603 -0.828677 0.699539
0.491119 -0.629774 -0.632143 0.2898
[100 rows x 4 columns]
# Bind the rows from the second dataset into the first dataset.
>>> df1.rbind(df2)
A B C D
--------- ---------- --------- ----------
0.412228 -0.991376 -1.44374 -0.276455
0.348039 -0.193704 -0.370882 0.162211
0.125303 -1.24546 -0.916738 1.08088
0.293062 0.516151 0.739798 -0.430679
-0.363344 0.0558051 -1.43888 1.13882
-1.17492 -0.332647 -1.18689 0.533313
0.154774 1.46559 0.373058 -0.915895
0.555835 -0.0891554 -1.19151 0.623667
-1.13092 0.843549 -0.532341 -0.0739869
0.752855 -0.168504 -0.750161 -2.46084
[200 rows x 4 columns]