hex.gbm
Class DBinHistogram
java.lang.Object
water.Iced
hex.gbm.DHistogram<DBinHistogram>
hex.gbm.DBinHistogram
- All Implemented Interfaces:
- java.lang.Cloneable, Freezable
public class DBinHistogram
- extends DHistogram<DBinHistogram>
A Histogram, computed in parallel over a Vec.
A DBinHistogram bins every value added to it, and computes a the vec
min & max (for use in the next split), and response mean & variance for each
bin. DBinHistograms are initialized with a min, max and number-of-
elements to be added (all of which are generally available from a Vec).
Bins run from min to max in uniform sizes. If the DBinHistogram can
determine that fewer bins are needed (e.g. boolean columns run from 0 to 1,
but only ever take on 2 values, so only 2 bins are needed), then fewer bins
are used.
If we are successively splitting rows (e.g. in a decision tree), then a
fresh DBinHistogram for each split will dynamically re-bin the data.
Each successive split will logarithmically divide the data. At the first
split, outliers will end up in their own bins - but perhaps some central
bins may be very full. At the next split(s), the full bins will get split,
and again until (with a log number of splits) each bin holds roughly the
same amount of data. This dynamic binning resolves a lot of problems with
picking the proper bin count or limits - generally a few more tree levels
will equal any fancy but fixed-size binning strategy.
|
Constructor Summary |
DBinHistogram(java.lang.String name,
char nbins,
byte isInt,
float min,
float max,
long nelems)
|
| Methods inherited from class java.lang.Object |
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
_step
public final float _step
_bmin
public final float _bmin
_nbins
public final char _nbins
_bins
public long[] _bins
_mins
public float[] _mins
_maxs
public float[] _maxs
DBinHistogram
public DBinHistogram(java.lang.String name,
char nbins,
byte isInt,
float min,
float max,
long nelems)
smallCopy
public DHistogram smallCopy()
- Overrides:
smallCopy in class DHistogram<DBinHistogram>
bigCopy
public DBinHistogram bigCopy()
- Overrides:
bigCopy in class DHistogram<DBinHistogram>
nbins
public int nbins()
- Overrides:
nbins in class DHistogram<DBinHistogram>
bins
public long bins(int b)
- Overrides:
bins in class DHistogram<DBinHistogram>
mins
public float mins(int b)
- Overrides:
mins in class DHistogram<DBinHistogram>
maxs
public float maxs(int b)
- Overrides:
maxs in class DHistogram<DBinHistogram>
mean
public double mean(int b)
- Overrides:
mean in class DHistogram<DBinHistogram>
var
public double var(int b)
- Overrides:
var in class DHistogram<DBinHistogram>
scoreMSE
public DTree.Split scoreMSE(int col)
- Overrides:
scoreMSE in class DHistogram<DBinHistogram>
fini
public void fini()
- Overrides:
fini in class DHistogram<DBinHistogram>
tightenMinMax
public void tightenMinMax()
- Overrides:
tightenMinMax in class DHistogram<DBinHistogram>
initialHist
public static DBinHistogram[] initialHist(Frame fr,
int ncols,
char nbins)
isConstantResponse
public boolean isConstantResponse()
toString
public java.lang.String toString()
- Overrides:
toString in class java.lang.Object