Combining Rows from Two Datasets
--------------------------------

You can use the ``rbind`` function to combine two similar datasets into a single large dataset. This can be used, for example, to create a larger dataset by combining data from a validation dataset with its training or testing dataset.

Note that when using ``rbind``, the two datasets must have the same set of columns.

.. example-code::
   .. code-block:: r
   
	> library(h2o)
	> h2o.init()
	
	# Import an existing training dataset
	> ecg1Path <- "http://h2o-public-test-data.s3.amazonaws.com/smalldata/anomaly/ecg_discord_train.csv"
	> ecg1.hex <- h2o.importFile(path=ecg1Path, destination_frame="ecg1.hex")
	> print(dim(ecg1.hex))
	[1] 20 210 

	# Import an existing testing dataset
	> ecg2Path <- "http://h2o-public-test-data.s3.amazonaws.com/smalldata/anomaly/ecg_discord_test.csv"
	> ecg2.hex <- h2o.importFile(path=ecg2Path, destination_frame="ecg2.hex")
	> print(dim(ecg2.hex))
	[1] 23 210

	# Combine the two datasets into a single, larger dataset
	> ecgCombine.hex <- h2o.rbind(ecg1.hex, ecg2.hex)
	> print(dim(ecgCombine.hex))
	[1] 43 210


   .. code-block:: python

	>>> import h2o
	>>> import numpy as np
	>>> h2o.init()
	
	# Generate a random dataset with 100 rows 4 columns. Label the columns A, B, C, and D.
	>>> df1 = h2o.H2OFrame.from_python(np.random.randn(100,4).tolist(), column_names=list('ABCD'))
	>>> df1.describe
	        A           B          C           D
	---------  ----------  ---------  ----------
	 0.412228  -0.991376   -1.44374   -0.276455
	 0.348039  -0.193704   -0.370882   0.162211
	 0.125303  -1.24546    -0.916738   1.08088
	 0.293062   0.516151    0.739798  -0.430679
	-0.363344   0.0558051  -1.43888    1.13882
	-1.17492   -0.332647   -1.18689    0.533313
	 0.154774   1.46559     0.373058  -0.915895
	 0.555835  -0.0891554  -1.19151    0.623667
	-1.13092    0.843549   -0.532341  -0.0739869
	 0.752855  -0.168504   -0.750161  -2.46084

	[100 rows x 4 columns]
	
	# Generate a second random dataset with 100 rows and 4 columns. Again, label the columns, A, B, C, and D.
	>>> df2 = h2o.H2OFrame.from_python(np.random.randn(100,4).tolist(), column_names=list('ABCD'))
	>>> df2.describe
	          A          B          C          D
	-----------  ---------  ---------  ---------
	 0.00118227  -0.835817   1.06634    1.81794
	-0.542678    -0.494483   0.109813   0.714271
	-0.365611    -0.679095   0.891982  -1.93362
	-0.0533568    0.86035   -2.28902   -1.287
	-0.572775     1.30954    0.27412   -0.287373
	 0.310976    -0.594283  -0.566955   0.221888
	 1.34778     -1.02348    0.243686   0.319585
	 0.383136    -0.113979  -0.901779  -0.383478
	-0.968212    -0.606603  -0.828677   0.699539
	 0.491119    -0.629774  -0.632143   0.2898

	[100 rows x 4 columns]
	
	# Bind the rows from the second dataset into the first dataset.
	>>> df1.rbind(df2)
	        A           B          C           D
	---------  ----------  ---------  ----------
	 0.412228  -0.991376   -1.44374   -0.276455
	 0.348039  -0.193704   -0.370882   0.162211
	 0.125303  -1.24546    -0.916738   1.08088
	 0.293062   0.516151    0.739798  -0.430679
	-0.363344   0.0558051  -1.43888    1.13882
	-1.17492   -0.332647   -1.18689    0.533313
	 0.154774   1.46559     0.373058  -0.915895
	 0.555835  -0.0891554  -1.19151    0.623667
	-1.13092    0.843549   -0.532341  -0.0739869
	 0.752855  -0.168504   -0.750161  -2.46084

	[200 rows x 4 columns]