DHistogram (h2o-algos version 3.12.0.1 API)

java.lang.Object
- water.Iced
- - hex.tree.DHistogram

All Implemented Interfaces:

java.io.Externalizable, java.io.Serializable, java.lang.Cloneable, water.Freezable
```
public final class DHistogram
extends water.Iced
```
A Histogram, computed in parallel over a Vec.
A DHistogram bins every value added to it, and computes a the vec min and max (for use in the next split), and response mean and variance for each bin. DHistograms are initialized with a min, max and number-of- elements to be added (all of which are generally available from a Vec). Bins run from min to max in uniform sizes. If the DHistogram can determine that fewer bins are needed (e.g. boolean columns run from 0 to 1, but only ever take on 2 values, so only 2 bins are needed), then fewer bins are used.
DHistogram are shared per-node, and atomically updated. There's an add call to help cross-node reductions. The data is stored in primitive arrays, so it can be sent over the wire.
If we are successively splitting rows (e.g. in a decision tree), then a fresh DHistogram for each split will dynamically re-bin the data. Each successive split will logarithmically divide the data. At the first split, outliers will end up in their own bins - but perhaps some central bins may be very full. At the next split(s) - if they happen at all - the full bins will get split, and again until (with a log number of splits) each bin holds roughly the same amount of data. This 'UniformAdaptive' binning resolves a lot of problems with picking the proper bin count or limits - generally a few more tree levels will equal any fancy but fixed-size binning strategy.
Support for histogram split points based on quantiles (or random points) is available as well, via _histoType.

See Also:
Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class DHistogram.NASplitDir
Split direction for missing values.

Nested Classes
Modifier and Type	Class and Description
`static class`	`DHistogram.NASplitDir` Split direction for missing values.

Field Summary

Fields
Modifier and Type	Field and Description
`water.Key`	`_globalQuantilesKey`
`boolean`	`_hasQuantiles`
`SharedTreeModel.SharedTreeParameters.HistogramType`	`_histoType`
`byte`	`_isInt`
`double`	`_maxEx`
`protected double`	`_maxIn`
`double`	`_min`
`protected double`	`_min2`
`double`	`_minSplitImprovement`
`java.lang.String`	`_name`
`char`	`_nbin`
`long`	`_seed`
`double[]`	`_splitPts`
`double`	`_step`
`protected double[]`	`_vals`

Constructor Summary

Constructors
Constructor and Description
`DHistogram(java.lang.String name, int nbins, int nbins_cats, byte isInt, double min, double maxEx, double minSplitImprovement, SharedTreeModel.SharedTreeParameters.HistogramType histogramType, long seed, water.Key globalQuantilesKey)`

Method Summary

Methods
Modifier and Type	Method and Description
`static int[]`	`activeColumns(DHistogram[] hist)`
`void`	`add(DHistogram dsh)`
`void`	`addNasAtomic(double y, double wy, double wyy)`
`void`	`addNasPlain(double... ds)`
`void`	`addWAtomic(int i, double wDelta)`
`int`	`bin(double col_data)`
`double`	`binAt(int b)`
`double`	`bins(int b)`
`double`	`find_maxEx()`
`static double`	`find_maxEx(double maxIn, int isInt)`
`double`	`find_maxIn()`
`double`	`find_min()`
`void`	`incr0(int b, double y, double w)`
`void`	`incr1(int b, double y, double yy)`
`void`	`init()`
`void`	`init(double[] vals)`
`static DHistogram[]`	`initialHist(water.fvec.Frame fr, int ncols, int nbins, DHistogram[] hs, long seed, SharedTreeModel.SharedTreeParameters parms, water.Key[] globalQuantilesKey)`
`static DHistogram`	`make(java.lang.String name, int nbins, byte isInt, double min, double maxEx, long seed, SharedTreeModel.SharedTreeParameters parms, water.Key globalQuantilesKey)`
`int`	`nbins()`
`void`	`reducePrecision()` Cast bin values *except for sums of weights and Na-bucket counters to floats to drop least significant bits.
`void`	`setMaxIn(double max)`
`void`	`setMin(double min)`
`java.lang.String`	`toString()`
`void`	`updateHisto(double[] ws, double[] cs, double[] ys, int[] rows, int hi, int lo)` Update counts in appropriate bins.
`void`	`updateSharedHistosAndReset(hex.tree.ScoreBuildHistogram.LocalHisto lh, double[] ws, double[] cs, double[] ys, int[] rows, int hi, int lo)`
`double`	`var(int b)` compute the sample variance within a given bin
`double`	`w(int i)`
`double`	`wNA()`
`double`	`wY(int i)`
`double`	`wYNA()`
`double`	`wYY(int i)`
`double`	`wYYNA()`

Methods inherited from class water.Iced
asBytes, clone, copyOver, frozenType, read, readExternal, readJSON, reloadFromBytes, toJsonString, write, writeExternal, writeJSON

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Detail

_name

public final transient java.lang.String _name

_minSplitImprovement

public final double _minSplitImprovement

_isInt
```
public final byte _isInt
```

_nbin
```
public char _nbin
```

_step
```
public double _step
```

_min
```
public final double _min
```

_maxEx
```
public final double _maxEx
```

_vals
```
protected double[] _vals
```

_min2
```
protected double _min2
```

_maxIn
```
protected double _maxIn
```

_histoType

public SharedTreeModel.SharedTreeParameters.HistogramType _histoType

_splitPts
```
public transient double[] _splitPts
```

_seed
```
public final long _seed
```

_hasQuantiles
```
public transient boolean _hasQuantiles
```

_globalQuantilesKey
```
public water.Key _globalQuantilesKey
```

Constructor Detail

DHistogram

public DHistogram(java.lang.String name,
          int nbins,
          int nbins_cats,
          byte isInt,
          double min,
          double maxEx,
          double minSplitImprovement,
          SharedTreeModel.SharedTreeParameters.HistogramType histogramType,
          long seed,
          water.Key globalQuantilesKey)

Method Detail

w
```
public double w(int i)
```

wY
```
public double wY(int i)
```

wYY
```
public double wYY(int i)
```

addWAtomic

public void addWAtomic(int i,
              double wDelta)

addNasAtomic

public void addNasAtomic(double y,
                double wy,
                double wyy)

addNasPlain

public void addNasPlain(double... ds)

wNA
```
public double wNA()
```

wYNA
```
public double wYNA()
```

wYYNA
```
public double wYYNA()
```

activeColumns

public static int[] activeColumns(DHistogram[] hist)

setMin
```
public void setMin(double min)
```

setMaxIn
```
public void setMaxIn(double max)
```

bin
```
public int bin(double col_data)
```

binAt
```
public double binAt(int b)
```

nbins
```
public int nbins()
```

bins
```
public double bins(int b)
```

init
```
public void init()
```

init
```
public void init(double[] vals)
```

add
```
public void add(DHistogram dsh)
```

find_min
```
public double find_min()
```

find_maxIn
```
public double find_maxIn()
```

find_maxEx
```
public double find_maxEx()
```

find_maxEx

public static double find_maxEx(double maxIn,
                int isInt)

initialHist

public static DHistogram[] initialHist(water.fvec.Frame fr,
                       int ncols,
                       int nbins,
                       DHistogram[] hs,
                       long seed,
                       SharedTreeModel.SharedTreeParameters parms,
                       water.Key[] globalQuantilesKey)

make

public static DHistogram make(java.lang.String name,
              int nbins,
              byte isInt,
              double min,
              double maxEx,
              long seed,
              SharedTreeModel.SharedTreeParameters parms,
              water.Key globalQuantilesKey)

toString
```
public java.lang.String toString()
```
Overrides:

toString in class java.lang.Object

var
```
public double var(int b)
```
compute the sample variance within a given bin

Parameters:
b - bin id

Returns:
sample variance (>= 0)

incr0

public void incr0(int b,
         double y,
         double w)

incr1

public void incr1(int b,
         double y,
         double yy)

updateHisto
```
public void updateHisto(double[] ws,
               double[] cs,
               double[] ys,
               int[] rows,
               int hi,
               int lo)
```
Update counts in appropriate bins. Not thread safe, assumed to have private copy.

Parameters:
ws - observation weights
cs - column data
ys - response
rows - rows sorted by leaf assignemnt
hi - upper bound on index into rows array to be processed by this call (exclusive)
lo - lower bound on index into rows array to be processed by this call (inclusive)

reducePrecision
```
public void reducePrecision()
```
Cast bin values *except for sums of weights and Na-bucket counters to floats to drop least significant bits. Improves reproducibility (drop bits most affected by floating point error).

updateSharedHistosAndReset

public void updateSharedHistosAndReset(hex.tree.ScoreBuildHistogram.LocalHisto lh,
                              double[] ws,
                              double[] cs,
                              double[] ys,
                              int[] rows,
                              int hi,
                              int lo)

Class DHistogram

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class water.Iced

Methods inherited from class java.lang.Object

Field Detail

_name

_minSplitImprovement

_isInt

_nbin

_step

_min

_maxEx

_vals

_min2

_maxIn

_histoType

_splitPts

_seed

_hasQuantiles

_globalQuantilesKey

Constructor Detail

DHistogram

Method Detail

w

wY

wYY

addWAtomic

addNasAtomic

addNasPlain

wNA

wYNA

wYYNA

activeColumns

setMin

setMaxIn

bin

binAt

nbins

bins

init

init

add

find_min

find_maxIn

find_maxEx

find_maxEx

initialHist

make

toString

var

incr0

incr1

updateHisto

reducePrecision

updateSharedHistosAndReset