hex.gbm
Class DHistogram<T extends DHistogram>

java.lang.Object
  extended by water.Iced
      extended by hex.gbm.DHistogram<T>
All Implemented Interfaces:
java.lang.Cloneable, Freezable
Direct Known Subclasses:
DBinHistogram

public class DHistogram<T extends DHistogram>
extends Iced

A DHistogram, computed in parallel over a Vec.

A DHistogram bins (by default into bins) every value added to it, and computes a the min, max, and either class distribution or mean & variance for each bin. DHistograms are initialized with a min, max and number-of-elements to be added (all of which are generally available from a Vec). Bins normally run from min to max in uniform sizes, but if the DHistogram can determine that fewer bins are needed (e.g. boolean columns run from 0 to 1, but only ever take on 2 values, so only 2 bins are needed), then fewer bins are used.

If we are successively splitting rows (e.g. in a decision tree), then a fresh DHistogram for each split will dynamically re-bin the data. Each successive split then, will logarithmically divide the data. At the first split, outliers will end up in their own bins - but perhaps some central bins may be very full. At the next split(s), the full bins will get split, and again until (with a log number of splits) each bin holds roughly the same amount of data.


Constructor Summary
DHistogram(java.lang.String name, byte isInt)
           
DHistogram(java.lang.String name, byte isInt, float min, float max)
           
 
Method Summary
 DHistogram bigCopy()
           
 long bins(int i)
           
protected static int byteSize(byte[] bs)
           
protected static int byteSize(double[] fs)
           
protected static int byteSize(float[] fs)
           
protected static int byteSize(int[] is)
           
protected static int byteSize(long[] ls)
           
protected static int byteSize(java.lang.Object[] ls)
           
protected static int byteSize(short[] ss)
           
 void fini()
           
 double max()
           
 float maxs(int i)
           
 double mean(int bin)
           
 double min()
           
 float mins(int i)
           
 java.lang.String name()
           
 int nbins()
           
 DTree.Split scoreMSE(int col)
           
 DHistogram smallCopy()
           
 void tightenMinMax()
           
 double var(int bin)
           
 
Methods inherited from class water.Iced
clone, frozenType, init, newInstance, read, toDocField, write, writeJSON, writeJSONFields
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DHistogram

public DHistogram(java.lang.String name,
                  byte isInt,
                  float min,
                  float max)

DHistogram

public DHistogram(java.lang.String name,
                  byte isInt)
Method Detail

smallCopy

public DHistogram smallCopy()

bigCopy

public DHistogram bigCopy()

nbins

public int nbins()

bins

public long bins(int i)

mins

public float mins(int i)

maxs

public float maxs(int i)

scoreMSE

public DTree.Split scoreMSE(int col)

mean

public double mean(int bin)

var

public double var(int bin)

tightenMinMax

public void tightenMinMax()

fini

public void fini()

min

public final double min()

max

public final double max()

name

public final java.lang.String name()

byteSize

protected static int byteSize(byte[] bs)

byteSize

protected static int byteSize(short[] ss)

byteSize

protected static int byteSize(float[] fs)

byteSize

protected static int byteSize(int[] is)

byteSize

protected static int byteSize(long[] ls)

byteSize

protected static int byteSize(double[] fs)

byteSize

protected static int byteSize(java.lang.Object[] ls)