public abstract class MRTask<T extends MRTask<T>> extends DTask<T> implements ForkJoinPool.ManagedBlocker
MRTask provides several map
and reduce
methods that can be
overriden to specify a computation. Several instances of this class will be
created to distribute the computation over F/J threads and machines. Non-transient
fields are copied and serialized to instances created for map invocations. Reduce
methods can store their results in fields. Results are serialized and reduced all the
way back to the invoking node. When the last reduce method has been called, fields
of the initial MRTask instance contains the computation results.
Apart from small reduced POJO returned to the calling node, MRtask2 can
produce output vector(s) as a result. These will have chunks co-located
with the input dataset, however, their number of lines will generally
differ, (so they won't be strictly compatible with the original). To produce
output vectors, call doAll.dfork version with required number of outputs and
override appropriate map
call taking required number of
NewChunks. MRTask will automatically close the new Appendable vecs and
produce an output frame with newly created Vecs.
DTask.DKeyTask<T extends DTask.DKeyTask,V extends Keyed>, DTask.RemoveCall
Modifier and Type | Field and Description |
---|---|
protected AppendableVec[] |
_appendables |
Frame |
_fr
The Vectors (or Keys) to work on.
|
protected Futures |
_fs
We can add more things to block on - in case we want a bunch of lazy
tasks produced by children to all end before this top-level task ends.
|
protected int |
_hi
Internal field to track a range of local Chunks to work on
|
Key[] |
_keys |
protected T |
_left
Internal field to track the left & right sub-range of chunks to work on
|
protected int |
_lo
Internal field to track a range of local Chunks to work on
|
protected short |
_nhi
Internal field to track a range of remote nodes/JVMs to work on
|
protected RPC<T> |
_nleft
Internal field to track the left & right remote nodes/JVMs to work on
|
protected RPC<T> |
_nrite
Internal field to track the left & right remote nodes/JVMs to work on
|
protected short |
_nxx
Internal field to track a range of remote nodes/JVMs to work on
|
protected T |
_rite
Internal field to track the left & right sub-range of chunks to work on
|
protected boolean |
_topLocal
Internal field to track if this is a top-level local call
|
_ex, _modifiesInputs
Modifier | Constructor and Description |
---|---|
|
MRTask() |
protected |
MRTask(H2O.H2OCountedCompleter cmp) |
Modifier and Type | Method and Description |
---|---|
AppendableVec[] |
appendables() |
T |
asyncExec(Frame fr) |
void |
asyncExec(int outputs,
Frame fr,
boolean run_local)
Fork the task in strictly non-blocking fashion.
|
T |
asyncExec(Vec... vecs) |
boolean |
block()
Possibly blocks the current thread, for example waiting for
a lock or condition.
|
protected void |
closeLocal()
Override to do any remote cleaning on the last remote instance of
this object, for disposing of node-local shared data structures.
|
void |
compute2()
Called from FJ threads to do local work.
|
T |
dfork(Frame fr) |
T |
dfork(int outputs,
Frame fr,
boolean run_local) |
T |
dfork(int outputs,
Vec... vecs) |
T |
dfork(Vec... vecs)
Invokes the map/reduce computation over the given Frame.
|
void |
dinvoke(H2ONode sender)
Called once on remote at top level, probably with a subset of the cloud.
|
T |
doAll(Frame fr) |
T |
doAll(Frame fr,
boolean run_local)
Invokes the map/reduce computation over the given Frame.
|
T |
doAll(int outputs,
Frame fr) |
T |
doAll(int outputs,
Frame fr,
boolean run_local) |
T |
doAll(int outputs,
Vec... vecs) |
T |
doAll(Key... keys) |
T |
doAll(Vec... vecs)
Invokes the map/reduce computation over the given Vecs.
|
T |
doAllNodes() |
T |
getResult()
Block for & get any final results from a dfork'd MRTask.
|
boolean |
isReleasable()
Returns
true if blocking is unnecessary. |
void |
map(Chunk c)
Override with your map implementation.
|
void |
map(Chunk[] cs)
Override with your map implementation.
|
void |
map(Chunk[] cs,
NewChunk nc) |
void |
map(Chunk[] cs,
NewChunk[] ncs) |
void |
map(Chunk[] cs,
NewChunk nc1,
NewChunk nc2) |
void |
map(Chunk c0,
Chunk c1)
Override with your map implementation.
|
void |
map(Chunk c0,
Chunk c1,
Chunk c2)
Override with your map implementation.
|
void |
map(Chunk c0,
Chunk c1,
Chunk c2,
NewChunk nc) |
void |
map(Chunk c0,
Chunk c1,
Chunk c2,
NewChunk nc1,
NewChunk nc2) |
void |
map(Chunk c0,
Chunk c1,
NewChunk nc) |
void |
map(Chunk c0,
Chunk c1,
NewChunk nc1,
NewChunk nc2) |
void |
map(Chunk c,
NewChunk nc) |
void |
map(Key key)
Override with your map implementation.
|
void |
onCompletion(CountedCompleter caller)
OnCompletion - reduce the left & right into self.
|
boolean |
onExceptionalCompletion(java.lang.Throwable ex,
CountedCompleter caller)
Cancel/kill all work as we can, then rethrow...
|
Frame |
outputFrame() |
Frame |
outputFrame(Key key,
java.lang.String[] names,
java.lang.String[][] domains) |
Frame |
outputFrame(Key key,
java.lang.String[] names,
java.lang.String[][] domains,
Futures fs) |
Frame |
outputFrame(java.lang.String[] names,
java.lang.String[][] domains) |
protected void |
postGlobal() |
byte |
priority() |
java.lang.String |
profString() |
void |
reduce(T mrt)
Override to combine results from 'mrt' into 'this' MRTask.
|
void |
setProfile(boolean b) |
protected void |
setupLocal()
Override to do any remote initialization on the 1st remote instance of
this object, for initializing node-local shared data structures.
|
copyOver, getDException, hasException, logVerbose, onAck, onAckAck, setException
clone, compute, frozenType, icer, nextThrPriority, read_impl, read, readJSON_impl, readJSON, write_impl, write, writeHTML_impl, writeHTML, writeJSON_impl, writeJSON
addToPendingCount, compareAndSetPendingCount, complete, exec, getCompleter, getPendingCount, getRawResult, setCompleter, setPendingCount, setRawResult, tryComplete
adapt, adapt, adapt, cancel, compareAndSetForkJoinTaskTag, completeExceptionally, fork, get, get, getException, getForkJoinTaskTag, getPool, getQueuedTaskCount, getSurplusQueuedTaskCount, helpQuiesce, inForkJoinPool, invoke, invokeAll, invokeAll, invokeAll, isCancelled, isCompletedAbnormally, isCompletedNormally, isDone, join, peekNextLocalTask, pollNextLocalTask, pollTask, quietlyComplete, quietlyInvoke, quietlyJoin, reinitialize, setForkJoinTaskTag, tryUnfork
public Frame _fr
public Key[] _keys
protected AppendableVec[] _appendables
protected short _nxx
protected short _nhi
protected transient RPC<T extends MRTask<T>> _nleft
protected transient RPC<T extends MRTask<T>> _nrite
protected transient boolean _topLocal
protected transient int _lo
protected transient int _hi
protected transient T extends MRTask<T> _left
protected transient T extends MRTask<T> _rite
protected transient Futures _fs
public MRTask()
protected MRTask(H2O.H2OCountedCompleter cmp)
public byte priority()
priority
in class H2O.H2OCountedCompleter<T extends MRTask<T>>
public Frame outputFrame()
public Frame outputFrame(java.lang.String[] names, java.lang.String[][] domains)
public Frame outputFrame(Key key, java.lang.String[] names, java.lang.String[][] domains)
public Frame outputFrame(Key key, java.lang.String[] names, java.lang.String[][] domains, Futures fs)
public AppendableVec[] appendables()
public void map(Chunk c)
public void map(Chunk c0, Chunk c1)
public void map(Chunk c0, Chunk c1, Chunk c2)
public void map(Chunk[] cs)
public void map(Key key)
public void reduce(T mrt)
protected void setupLocal()
protected void closeLocal()
public java.lang.String profString()
public final T doAll(Vec... vecs)
public final T doAll(Frame fr, boolean run_local)
public final void asyncExec(int outputs, Frame fr, boolean run_local)
public final T dfork(Vec... vecs)
public final T getResult()
public boolean isReleasable()
ForkJoinPool.ManagedBlocker
true
if blocking is unnecessary.isReleasable
in interface ForkJoinPool.ManagedBlocker
public boolean block() throws java.lang.InterruptedException
ForkJoinPool.ManagedBlocker
block
in interface ForkJoinPool.ManagedBlocker
true
if no additional blocking is necessary
(i.e., if isReleasable would return true)java.lang.InterruptedException
- if interrupted while waiting
(the method is not required to do so, but is allowed to)public final void dinvoke(H2ONode sender)
public T doAllNodes()
public void setProfile(boolean b)
public final void compute2()
compute2
in class H2O.H2OCountedCompleter<T extends MRTask<T>>
public final void onCompletion(CountedCompleter caller)
onCompletion
in class CountedCompleter
caller
- the task invoking this method (which may
be this task itself).protected void postGlobal()
public final boolean onExceptionalCompletion(java.lang.Throwable ex, CountedCompleter caller)
onExceptionalCompletion
in class H2O.H2OCountedCompleter<T extends MRTask<T>>
ex
- the exceptioncaller
- the task invoking this method (which may
be this task itself).