public class MRUtils
extends java.lang.Object
| Modifier and Type | Class and Description |
|---|---|
static class |
MRUtils.ClassDist
Compute the class distribution from a class label vector
(not counting missing values)
Usage 1: Label vector is categorical
------------------------------------
Vec label = ...;
assert(label.isCategorical());
double[] dist = new ClassDist(label).doAll(label).dist();
Usage 2: Label vector is numerical
----------------------------------
Vec label = ...;
int num_classes = ...;
assert(label.isInt());
double[] dist = new ClassDist(num_classes).doAll(label).dist();
|
static class |
MRUtils.Dist |
| Constructor and Description |
|---|
MRUtils() |
| Modifier and Type | Method and Description |
|---|---|
static Frame |
sampleFrame(Frame fr,
long rows,
long seed)
Sample rows from a frame.
|
static Frame |
sampleFrameStratified(Frame fr,
Vec label,
Vec weights,
float[] sampling_ratios,
long seed,
boolean debug)
Stratified sampling
|
static Frame |
sampleFrameStratified(Frame fr,
Vec label,
Vec weights,
float[] sampling_ratios,
long maxrows,
long seed,
boolean allowOversampling,
boolean verbose)
Stratified sampling for classifiers - FIXME: For weights, this is not accurate, as the sampling is done with uniform weights
|
static Frame |
shuffleFramePerChunk(Frame fr,
long seed)
Row-wise shuffle of a frame (only shuffles rows inside of each chunk)
|
public static Frame sampleFrame(Frame fr, long rows, long seed)
fr - Input framerows - Approximate number of rows to sample (across all chunks)seed - Seed for RNGpublic static Frame shuffleFramePerChunk(Frame fr, long seed)
fr - Input framepublic static Frame sampleFrameStratified(Frame fr, Vec label, Vec weights, float[] sampling_ratios, long maxrows, long seed, boolean allowOversampling, boolean verbose)
fr - Input framelabel - Label vector (must be categorical)weights - Weights vector, can be nullsampling_ratios - Optional: array containing the requested sampling ratios per class (in order of domains), will be overwritten if it contains all 0smaxrows - Maximum number of rows in the returned frameseed - RNG seed for samplingallowOversampling - Allow oversampling of minority classesverbose - Whether to print verbose infopublic static Frame sampleFrameStratified(Frame fr, Vec label, Vec weights, float[] sampling_ratios, long seed, boolean debug)
fr - Input framelabel - Label vector (from the input frame)weights - Weight vector (from the input frame), can be nullsampling_ratios - Given sampling ratios for each class, in order of domainsseed - RNG seeddebug - Whether to print debug info