public class DHistogram<T extends DHistogram> extends Iced
A DHistogram bins (by default into bins)
every value added to it, and computes a the min, max, and either
class distribution or mean & variance for each bin. DHistograms are initialized with a min, max and number-of-elements
to be added (all of which are generally available from a Vec).
Bins normally run from min to max in uniform sizes, but if the
DHistogram can determine that fewer bins are needed
(e.g. boolean columns run from 0 to 1, but only ever take on 2
values, so only 2 bins are needed), then fewer bins are used.
If we are successively splitting rows (e.g. in a decision tree), then a
fresh DHistogram for each split will dynamically re-bin the data.
Each successive split then, will logarithmically divide the data. At the
first split, outliers will end up in their own bins - but perhaps some
central bins may be very full. At the next split(s), the full bins will get
split, and again until (with a log number of splits) each bin holds roughly
the same amount of data.
| Constructor and Description |
|---|
DHistogram(java.lang.String name,
byte isInt) |
DHistogram(java.lang.String name,
byte isInt,
float min,
float max) |
| Modifier and Type | Method and Description |
|---|---|
DHistogram |
bigCopy() |
long |
bins(int i) |
protected static int |
byteSize(byte[] bs) |
protected static int |
byteSize(double[] fs) |
protected static int |
byteSize(float[] fs) |
protected static int |
byteSize(int[] is) |
protected static int |
byteSize(long[] ls) |
protected static int |
byteSize(java.lang.Object[] ls) |
protected static int |
byteSize(short[] ss) |
void |
fini() |
double |
max() |
float |
maxs(int i) |
double |
mean(int bin) |
double |
min() |
float |
mins(int i) |
java.lang.String |
name() |
int |
nbins() |
DTree.Split |
scoreMSE(int col) |
DHistogram |
smallCopy() |
void |
tightenMinMax() |
double |
var(int bin) |
clone, frozenType, init, newInstance, read, toDocField, write, writeJSON, writeJSONFieldspublic DHistogram(java.lang.String name,
byte isInt,
float min,
float max)
public DHistogram(java.lang.String name,
byte isInt)
public DHistogram smallCopy()
public DHistogram bigCopy()
public int nbins()
public long bins(int i)
public float mins(int i)
public float maxs(int i)
public DTree.Split scoreMSE(int col)
public double mean(int bin)
public double var(int bin)
public void tightenMinMax()
public void fini()
public final double min()
public final double max()
public final java.lang.String name()
protected static int byteSize(byte[] bs)
protected static int byteSize(short[] ss)
protected static int byteSize(float[] fs)
protected static int byteSize(int[] is)
protected static int byteSize(long[] ls)
protected static int byteSize(double[] fs)
protected static int byteSize(java.lang.Object[] ls)