public class ScoreBuildHistogram2 extends ScoreBuildHistogram
Fuse 2 conceptual passes into one (MRTask):
The result is a set of DHistogram arrays; one DHistogram array for each unique 'leaf' in the tree being histogramed in parallel. These have node ID's (nids) from 'leaf' to 'tree._len'. Each DHistogram array is for all the columns in that 'leaf'.
The other result is a prediction "score" for the whole dataset, based on the previous passes' DHistograms. No CAS update: Sharing the histograms proved to be a performance problem on larger multi-cpu machines with many running threads, CAS was the bottleneck. To remove the CAS while minimizing the memory overhead (private copies of histograms), phase 2 is paralellized both over columns (primary) and rows (secondary). Parallelization over different columns precedes paralellization within each column to reduce number of extra histogram copies made. Expected number of per-column tasks running in parallel (and hence histogram copies) is given by exp(nthreads-pre-column) = max(1,H2O.NUMCPUS - num_cols)
DECIDED_ROW, FRESH, MISSING_RESPONSE, OUT_OF_BAG, UNDECIDED_CHILD_NODE_ID
Constructor and Description |
---|
ScoreBuildHistogram2(water.H2O.H2OCountedCompleter cc,
int k,
int ncols,
int nbins,
int nbins_cats,
DTree tree,
int leaf,
DHistogram[][] hcs,
hex.genmodel.utils.DistributionFamily family,
int weightIdx,
int workIdx,
int nidIdxs) |
Modifier and Type | Method and Description |
---|---|
ScoreBuildHistogram |
dfork2(byte[] types,
water.fvec.Frame fr,
boolean run_local) |
void |
map(water.fvec.Chunk[] chks) |
void |
postGlobal() |
protected int[] |
score_decide(water.fvec.Chunk[] chks,
int[] nnids) |
void |
setupLocal() |
isDecidedRow, isOOBRow, nid2Oob, oob2Nid, reduce, score_decide
appendables, asyncExecOnAllNodes, block, closeLocal, compute2, dfork, dfork, dfork, dfork, dfork, dinvoke, doAll, doAll, doAll, doAll, doAll, doAll, doAll, doAll, doAll, doAll, doAll, doAll, doAllNodes, getResult, getResult, isReleasable, map, map, map, map, map, map, map, map, map, map, modifiesVolatileVecs, onCompletion, onExceptionalCompletion, outputFrame, outputFrame, outputFrame, profile, profString, self
copyOver, getDException, hasException, logVerbose, onAck, onAckAck, setException
asBytes, clone, compute, compute1, currThrPriority, frozenType, icer, priority, read, readJSON, reloadFromBytes, write, writeJSON
__tryComplete, addToPendingCount, compareAndSetPendingCount, complete, exec, getCompleter, getPendingCount, getRawResult, setCompleter, setPendingCount, setRawResult, tryComplete
adapt, adapt, adapt, cancel, compareAndSetForkJoinTaskTag, completeExceptionally, fork, get, get, getException, getForkJoinTaskTag, getPool, getQueuedTaskCount, getSurplusQueuedTaskCount, helpQuiesce, inForkJoinPool, invoke, invokeAll, invokeAll, invokeAll, isCancelled, isCompletedAbnormally, isCompletedNormally, isDone, join, peekNextLocalTask, pollNextLocalTask, pollTask, quietlyComplete, quietlyInvoke, quietlyJoin, reinitialize, setForkJoinTaskTag, tryUnfork
public ScoreBuildHistogram2(water.H2O.H2OCountedCompleter cc, int k, int ncols, int nbins, int nbins_cats, DTree tree, int leaf, DHistogram[][] hcs, hex.genmodel.utils.DistributionFamily family, int weightIdx, int workIdx, int nidIdxs)
public ScoreBuildHistogram dfork2(byte[] types, water.fvec.Frame fr, boolean run_local)
dfork2
in class ScoreBuildHistogram
public void map(water.fvec.Chunk[] chks)
map
in class ScoreBuildHistogram
protected int[] score_decide(water.fvec.Chunk[] chks, int[] nnids)
public void setupLocal()
setupLocal
in class ScoreBuildHistogram
public void postGlobal()
postGlobal
in class water.MRTask<ScoreBuildHistogram>