public abstract class DHistogram<TDH extends DHistogram> extends Iced
A DHistogram
bins every value added to it, and computes a the vec
min and max (for use in the next split), and response mean and variance for each
bin. DHistogram
s are initialized with a min, max and number-of-
elements to be added (all of which are generally available from a Vec).
Bins run from min to max in uniform sizes. If the DHistogram
can
determine that fewer bins are needed (e.g. boolean columns run from 0 to 1,
but only ever take on 2 values, so only 2 bins are needed), then fewer bins
are used.
DHistogram
are shared per-node, and atomically updated. There's an
add
call to help cross-node reductions. The data is stored in
primitive arrays, so it can be sent over the wire.
If we are successively splitting rows (e.g. in a decision tree), then a
fresh DHistogram
for each split will dynamically re-bin the data.
Each successive split will logarithmically divide the data. At the first
split, outliers will end up in their own bins - but perhaps some central
bins may be very full. At the next split(s), the full bins will get split,
and again until (with a log number of splits) each bin holds roughly the
same amount of data. This dynamic binning resolves a lot of problems with
picking the proper bin count or limits - generally a few more tree levels
will equal any fancy but fixed-size binning strategy.
Modifier and Type | Field and Description |
---|---|
int[] |
_bins |
byte |
_isInt |
float |
_maxEx |
protected float |
_maxIn |
float |
_min |
protected float |
_min2 |
java.lang.String |
_name |
char |
_nbin |
float |
_step |
Constructor and Description |
---|
DHistogram(java.lang.String name,
int nbins,
byte isInt,
float min,
float maxEx,
long nelems) |
Modifier and Type | Method and Description |
---|---|
int |
bins(int b) |
long |
byteSize() |
abstract long |
byteSize0() |
float |
find_maxEx() |
static float |
find_maxEx(float maxIn,
int isInt) |
float |
find_maxIn() |
float |
find_min() |
static DHistogram[] |
initialHist(Frame fr,
int ncols,
int nbins,
DHistogram[] hs,
boolean isBinom) |
boolean |
isConstantResponse() |
static DHistogram |
make(java.lang.String name,
int nbins,
byte isInt,
float min,
float maxEx,
long nelems,
boolean isBinom) |
float |
maxsIn(int b) |
abstract double |
mean(int b) |
float |
mins(int b) |
int |
nbins() |
abstract DTree.Split |
scoreMSE(int col) |
void |
setMax(float max) |
void |
setMin(float min) |
java.lang.String |
toString() |
abstract double |
var(int b) |
clone, frozenType, init, newInstance, read, toDocField, write, writeJSON, writeJSONFields
public final transient java.lang.String _name
public final byte _isInt
public final char _nbin
public final float _step
public final float _min
public final float _maxEx
public int[] _bins
protected float _min2
protected float _maxIn
public DHistogram(java.lang.String name, int nbins, byte isInt, float min, float maxEx, long nelems)
public void setMin(float min)
public void setMax(float max)
public int nbins()
public int bins(int b)
public float mins(int b)
public float maxsIn(int b)
public abstract double mean(int b)
public abstract double var(int b)
public float find_min()
public float find_maxIn()
public float find_maxEx()
public static float find_maxEx(float maxIn, int isInt)
public abstract DTree.Split scoreMSE(int col)
public static DHistogram[] initialHist(Frame fr, int ncols, int nbins, DHistogram[] hs, boolean isBinom)
public static DHistogram make(java.lang.String name, int nbins, byte isInt, float min, float maxEx, long nelems, boolean isBinom)
public boolean isConstantResponse()
public java.lang.String toString()
toString
in class java.lang.Object
public abstract long byteSize0()
public long byteSize()