Data manipulation¶
This section provides examples of common tasks performed when preparing data for machine learning. These examples are run on a local cluster.
Note
The examples in this section include datasets that are pulled from GitHub and S3.
- Upload a file
- Import a file
- Import multiple files
- Download data
- Change the Column Type
- Combine columns from two datasets
- Combine rows from two datasets
- Fill NA values
- Group by
- Impute data
- Merge two datasets
- Pivot tables
- Replace values in a frame
- Slice columns
- Slice rows
- Sort columns
- Split datasets into training/testing/validating
- Tokenize strings
Feature engineering¶
H2O-3 also has methods for feature engineering. Target Encoding is a categorical encoding technique which replaces a categorical value with the mean of the target variable (this is especially useful for high-cardinality features). Word2vec is a text processing method which converts a corpus of text into an output of word vectors.