Combining Rows from Two Datasets¶

You can use the rbind function to combine two similar datasets into a single large dataset. This can be used, for example, to create a larger dataset by combining data from a validation dataset with its training or testing dataset.

Note that when using rbind, the two datasets must have the same set of columns.

> library(h2o)
> h2o.init(nthreads=-1)

# Import exsiting training and testing datasets
> ecg1Path = "../../../smalldata/anomaly/ecg_discord_train.csv"
> ecg1.hex = h2o.importFile(path=ecg1Path, destination_frame="ecg1.hex")
> ecg2Path = "../../../smalldata/anomaly/ecg_discord_test.csv"
> ecg2.hex = h2o.importFile(path=ecg2Path, destination_frame="ecg2.hex")

# Combine the two datasets into a single, larger dataset
> ecgCombine.hex <- h2o.rbind(ecg1.hex, ecg2.hex)

>>> import h2o
>>> import numpy as np
>>> h2o.init()

# Generate a random dataset with 100 rows 4 columns. Label the columns A, B, C, and D.
>>> df1 = h2o.H2OFrame.from_python(np.random.randn(100,4).tolist(), column_names=list('ABCD'))
>>> df1.describe
          A           B            C            D
-----------  ----------  -----------  -----------
nan          nan         nan          nan
 -0.148045     0.516651   -0.218871    -2.11336
  0.818191    -1.07749    -0.303827     0.0234708
 -0.894042    -1.83727     1.69621     -0.306524
 -1.90056      0.528147   -0.745829     0.325673
 -1.14653      0.146565   -1.12463     -1.39162
  0.81608      0.21313    -0.122169     1.47247
  0.419028     1.14975     0.913349     0.975779
  0.419134    -1.63199     0.633799     0.482761
  0.0366856   -1.09199    -0.0831492    2.17306

[101 rows x 4 columns]

# Generate a second random dataset with 100 rows and 4 columns. Again, label the columns, A, B, C, and D.
>>> df2 = h2o.H2OFrame.from_python(np.random.randn(100,4).tolist(), column_names=list('ABCD'))
>>> df2.describe
          A            B           C           D
-----------  -----------  ----------  ----------
nan          nan          nan         nan
  0.626459    -1.80634     -1.08245     1.29828
  1.31526     -0.223264     0.172243   -0.76666
  1.70095     -0.666482    -0.486086   -1.16518
 -0.241271    -1.08439      1.75451     1.37618
 -0.151067    -0.830386     0.7113     -0.979204
 -2.18042     -1.85949     -0.466211    0.707786
 -0.0657297   -0.0092001    1.3721     -0.570298
  1.59816     -0.149408    -0.874023   -0.883033
 -0.367047    -0.586965    -0.98553    -1.33043

[101 rows x 4 columns]

# Bind the rows from the second dataset into the first dataset.
>>> df1.rbind(df2)
>>> df1.describe
        A           B            C            D
-----------  ----------  -----------  -----------
nan          nan         nan          nan
 -0.148045     0.516651   -0.218871    -2.11336
  0.818191    -1.07749    -0.303827     0.0234708
 -0.894042    -1.83727     1.69621     -0.306524
 -1.90056      0.528147   -0.745829     0.325673
 -1.14653      0.146565   -1.12463     -1.39162
  0.81608      0.21313    -0.122169     1.47247
  0.419028     1.14975     0.913349     0.975779
  0.419134    -1.63199     0.633799     0.482761
  0.0366856   -1.09199    -0.0831492    2.17306

[202 rows x 4 columns]