public class DBinHistogram extends DHistogram<DBinHistogram>
A DBinHistogram bins every value added to it, and computes a the vec
min & max (for use in the next split), and response mean & variance for each
bin. DBinHistograms are initialized with a min, max and number-of-
elements to be added (all of which are generally available from a Vec).
Bins run from min to max in uniform sizes. If the DBinHistogram can
determine that fewer bins are needed (e.g. boolean columns run from 0 to 1,
but only ever take on 2 values, so only 2 bins are needed), then fewer bins
are used.
If we are successively splitting rows (e.g. in a decision tree), then a
fresh DBinHistogram for each split will dynamically re-bin the data.
Each successive split will logarithmically divide the data. At the first
split, outliers will end up in their own bins - but perhaps some central
bins may be very full. At the next split(s), the full bins will get split,
and again until (with a log number of splits) each bin holds roughly the
same amount of data. This dynamic binning resolves a lot of problems with
picking the proper bin count or limits - generally a few more tree levels
will equal any fancy but fixed-size binning strategy.
| Modifier and Type | Field and Description |
|---|---|
long[] |
_bins |
float |
_bmin |
float[] |
_maxs |
float[] |
_mins |
char |
_nbins |
float |
_step |
| Constructor and Description |
|---|
DBinHistogram(java.lang.String name,
char nbins,
byte isInt,
float min,
float max,
long nelems) |
| Modifier and Type | Method and Description |
|---|---|
DBinHistogram |
bigCopy() |
long |
bins(int b) |
void |
fini() |
static DBinHistogram[] |
initialHist(Frame fr,
int ncols,
char nbins) |
boolean |
isConstantResponse() |
float |
maxs(int b) |
double |
mean(int b) |
float |
mins(int b) |
int |
nbins() |
DTree.Split |
scoreMSE(int col) |
DHistogram |
smallCopy() |
void |
tightenMinMax() |
java.lang.String |
toString() |
double |
var(int b) |
byteSize, byteSize, byteSize, byteSize, byteSize, byteSize, byteSize, max, min, nameclone, frozenType, init, newInstance, read, toDocField, write, writeJSON, writeJSONFieldspublic final float _step
public final float _bmin
public final char _nbins
public long[] _bins
public float[] _mins
public float[] _maxs
public DBinHistogram(java.lang.String name,
char nbins,
byte isInt,
float min,
float max,
long nelems)
public DHistogram smallCopy()
smallCopy in class DHistogram<DBinHistogram>public DBinHistogram bigCopy()
bigCopy in class DHistogram<DBinHistogram>public int nbins()
nbins in class DHistogram<DBinHistogram>public long bins(int b)
bins in class DHistogram<DBinHistogram>public float mins(int b)
mins in class DHistogram<DBinHistogram>public float maxs(int b)
maxs in class DHistogram<DBinHistogram>public double mean(int b)
mean in class DHistogram<DBinHistogram>public double var(int b)
var in class DHistogram<DBinHistogram>public DTree.Split scoreMSE(int col)
scoreMSE in class DHistogram<DBinHistogram>public void fini()
fini in class DHistogram<DBinHistogram>public void tightenMinMax()
tightenMinMax in class DHistogram<DBinHistogram>public static DBinHistogram[] initialHist(Frame fr, int ncols, char nbins)
public boolean isConstantResponse()
public java.lang.String toString()
toString in class java.lang.Object