Fill NAs¶
Use this function to fill in NA values in a sequential manner up to a specified limit. When using this function, you will specify whether the method to fill the NAs should go forward (default) or backward, whether the NAs should be filled along rows (default) or columns, and the maximum number of consecutive NAs to fill (defaults to 1).
library(h2o)
h2o.init()
# Create a random data frame with 6 rows and 2 columns.
# Specify that no more than 70% of the values are NAs.
fr_with_nas = h2o.createFrame(categorical_fraction = 0.0,
missing_fraction = 0.7,
rows = 6,
cols = 2,
seed = 123)
fr_with_nas
C1 C2
1 NaN NaN
2 -77.10471 -93.64087
3 -13.65926 57.44389
4 NaN NaN
5 39.10130 NaN
6 NaN 55.43136
[6 rows x 2 columns]
# Forward fill a row. In R, the values for axis are 1 (row-wise) and 2 (column-wise)
fr <- h2o.fillna(fr_with_nas, "forward", axis = 1, maxlen = 1L)
fr
C1 C2
1 NaN NaN
2 -77.10471 -93.64087
3 -13.65926 57.44389
4 NaN NaN
5 39.10130 39.10130
6 NaN 55.43136
[6 rows x 2 columns]
import h2o
h2o.init()
# Create a random data frame with 10 rows and 3 columns.
# Specify that no more than 20% of the values are NAs.
df = h2o.create_frame(rows=10,
cols=3,
real_fraction=1.0,
real_range=100,
missing_fraction=0.2,
seed=123)
df
C1 C2 C3
-------- -------- --------
nan nan -77.1047
-93.6409 -13.6593 57.4439
-93.71 25.4342 39.1013
-95.8291 -92.4271 55.4314
84.6372 -43.4759 53.1715
-57.9583 27.4148 -26.9013
83.0921 -62.7819 -91.9426
-77.9814 64.3228 -93.954
nan -80.6142 nan
27.1672 60.5492 -13.2275
[10 rows x 3 columns]
# Forward fill a row. In Python, the values for axis are 0 (row-wise) and 1 (column-wise)
filled = df.fillna(method="forward",axis=0,maxlen=1)
filled
filled
C1 C2 C3
-------- -------- --------
nan nan -77.1047
-93.6409 -13.6593 57.4439
-93.71 25.4342 39.1013
-95.8291 -92.4271 55.4314
84.6372 -43.4759 53.1715
-57.9583 27.4148 -26.9013
83.0921 -62.7819 -91.9426
-77.9814 64.3228 -93.954
-77.9814 -80.6142 -93.954
27.1672 60.5492 -13.2275
[10 rows x 3 columns]