Slicing Rows

H2O lazily slices out rows of data and will only materialize a shared copy upon IO. This example shows how to slice rows from a frame of data.

  • r
  • python
> library(h2o)
> h2o.init()
> path <- "http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv"
> df <- h2o.importFile(path)

# Slice 1 row by index.
> c1 <- df[15,]
> print(c1)
  sepal_len sepal_wid petal_len petal_wid       class
1       5.8         4       1.2       0.2 Iris-setosa

[1 row x 5 columns]

# Slice a range of rows.
> c1_1 <- df[25:49,]
> print(c1_1)
  sepal_len sepal_wid petal_len petal_wid       class
1       4.8       3.4       1.9       0.2 Iris-setosa
2       5.0       3.0       1.6       0.2 Iris-setosa
3       5.0       3.4       1.6       0.4 Iris-setosa
4       5.2       3.5       1.5       0.2 Iris-setosa
5       5.2       3.4       1.4       0.2 Iris-setosa
6       4.7       3.2       1.6       0.2 Iris-setosa

[25 rows x 5 columns]

# Slice using a boolean mask. The output dataset will include rows with a sepal length less than 4.6.
> mask <- df[,"sepal_len"] < 4.6
> cols <- df[mask,]
> print(cols)
  sepal_len sepal_wid petal_len petal_wid       class
1       4.4       2.9       1.4       0.2 Iris-setosa
2       4.3       3.0       1.1       0.1 Iris-setosa
3       4.4       3.0       1.3       0.2 Iris-setosa
4       4.5       2.3       1.3       0.3 Iris-setosa
5       4.4       3.2       1.3       0.2 Iris-setosa

[5 rows x 5 columns]

# Filter out rows that contain missing values in a column. Note the use of '!' to perform a logical not.
> mask <- is.na(df[,"sepal_len"])
> cols <- df[!mask,]
> print(cols)
  sepal_len sepal_wid petal_len petal_wid       class
1       5.1       3.5       1.4       0.2 Iris-setosa
2       4.9       3.0       1.4       0.2 Iris-setosa
3       4.7       3.2       1.3       0.2 Iris-setosa
4       4.6       3.1       1.5       0.2 Iris-setosa
5       5.0       3.6       1.4       0.2 Iris-setosa
6       5.4       3.9       1.7       0.4 Iris-setosa

[150 rows x 5 columns]