public class MRUtils
extends java.lang.Object
Modifier and Type | Class and Description |
---|---|
static class |
MRUtils.ClassDist
Compute the class distribution from a class label vector
(not counting missing values)
Usage 1: Label vector is categorical
------------------------------------
Vec label = ...;
assert(label.isCategorical());
double[] dist = new ClassDist(label).doAll(label).dist();
Usage 2: Label vector is numerical
----------------------------------
Vec label = ...;
int num_classes = ...;
assert(label.isInt());
double[] dist = new ClassDist(num_classes).doAll(label).dist();
|
static class |
MRUtils.Dist |
Constructor and Description |
---|
MRUtils() |
Modifier and Type | Method and Description |
---|---|
static Frame |
sampleFrame(Frame fr,
long rows,
long seed)
Sample rows from a frame.
|
static Frame |
sampleFrameStratified(Frame fr,
Vec label,
Vec weights,
float[] sampling_ratios,
long seed,
boolean debug)
Stratified sampling
|
static Frame |
sampleFrameStratified(Frame fr,
Vec label,
Vec weights,
float[] sampling_ratios,
long maxrows,
long seed,
boolean allowOversampling,
boolean verbose)
Stratified sampling for classifiers - FIXME: For weights, this is not accurate, as the sampling is done with uniform weights
|
static Frame |
shuffleFramePerChunk(Frame fr,
long seed)
Row-wise shuffle of a frame (only shuffles rows inside of each chunk)
|
public static Frame sampleFrame(Frame fr, long rows, long seed)
fr
- Input framerows
- Approximate number of rows to sample (across all chunks)seed
- Seed for RNGpublic static Frame shuffleFramePerChunk(Frame fr, long seed)
fr
- Input framepublic static Frame sampleFrameStratified(Frame fr, Vec label, Vec weights, float[] sampling_ratios, long maxrows, long seed, boolean allowOversampling, boolean verbose)
fr
- Input framelabel
- Label vector (must be categorical)weights
- Weights vector, can be nullsampling_ratios
- Optional: array containing the requested sampling ratios per class (in order of domains), will be overwritten if it contains all 0smaxrows
- Maximum number of rows in the returned frameseed
- RNG seed for samplingallowOversampling
- Allow oversampling of minority classesverbose
- Whether to print verbose infopublic static Frame sampleFrameStratified(Frame fr, Vec label, Vec weights, float[] sampling_ratios, long seed, boolean debug)
fr
- Input framelabel
- Label vector (from the input frame)weights
- Weight vector (from the input frame), can be nullsampling_ratios
- Given sampling ratios for each class, in order of domainsseed
- RNG seeddebug
- Whether to print debug info