Data Manipulation¶
This section provides examples of common tasks performed when preparing data for machine learning. These examples are run on a local cluster.
Note: The examples in this section include datasets that are pulled from GitHub and S3.
- Uploading a File
- Importing a File
- Importing Multiple Files
- Combining Columns from Two Datasets
- Combining Rows from Two Datasets
- Fill NAs
- Group By
- Imputing Data
- Merging Two Datasets
- Pivoting Tables
- Replacing Values in a Frame
- Slicing Columns
- Slicing Rows
- Sorting Columns
- Splitting Datasets into Training/Testing/Validating
- Target Encoding
- Tokenize Strings