public class MRUtils
extends java.lang.Object
| Modifier and Type | Class and Description |
|---|---|
static class |
MRUtils.ClassDist
Compute the class distribution from a class label vector
(not counting missing values)
Usage 1: Label vector is categorical
------------------------------------
Vec label = ...;
assert(label.isEnum());
long[] dist = new ClassDist(label).doAll(label).dist();
Usage 2: Label vector is numerical
----------------------------------
Vec label = ...;
int num_classes = ...;
assert(label.isInt());
long[] dist = new ClassDist(num_classes).doAll(label).dist();
|
| Constructor and Description |
|---|
MRUtils() |
| Modifier and Type | Method and Description |
|---|---|
static Frame |
sampleFrame(Frame fr,
long rows,
long seed)
Sample rows from a frame.
|
static Frame |
sampleFrameStratified(Frame fr,
Vec label,
float[] sampling_ratios,
long seed,
boolean debug)
Stratified sampling
|
static Frame |
sampleFrameStratified(Frame fr,
Vec label,
float[] sampling_ratios,
long maxrows,
long seed,
boolean allowOversampling,
boolean debug)
Stratified sampling for classifiers
|
static Frame |
shuffleAndBalance(Frame fr,
int splits,
long seed,
boolean local,
boolean shuffle)
Global redistribution of a Frame (balancing of chunks), done by calling process (all-to-one + one-to-all)
|
static Frame |
shuffleFramePerChunk(Frame fr,
long seed)
Row-wise shuffle of a frame (only shuffles rows inside of each chunk)
|
public static Frame sampleFrame(Frame fr, long rows, long seed)
fr - Input framerows - Approximate number of rows to sample (across all chunks)seed - Seed for RNGpublic static Frame shuffleFramePerChunk(Frame fr, long seed)
fr - Input framepublic static Frame shuffleAndBalance(Frame fr, int splits, long seed, boolean local, boolean shuffle)
fr - Input frameseed - RNG seedshuffle - whether to shuffle the data globallypublic static Frame sampleFrameStratified(Frame fr, Vec label, float[] sampling_ratios, long maxrows, long seed, boolean allowOversampling, boolean debug)
fr - Input framelabel - Label vector (must be enum)maxrows - Maximum number of rows in the returned frame, must be > minrowsseed - RNG seed for samplingsampling_ratios - Optional: array containing the requested sampling ratios per class (in order of domains), will be overwritten if it contains all 0spublic static Frame sampleFrameStratified(Frame fr, Vec label, float[] sampling_ratios, long seed, boolean debug)
fr - Input framelabel - Label vector (from the input frame)sampling_ratios - Given sampling ratios for each class, in order of domainsseed - RNG seeddebug - Whether to print debug info