public class WordCountTask extends water.MRTask<WordCountTask>
This task operates on all string columns in any frame handed to it via doAll(). It creates its' results in its' own frame. If initialized with a minimum frequency, the returned array will only contain words with counts greater than or equal to the minimum.
Currently the array is consolidated on the calling node. Given the limited vocabulary size of most languages, the resulting array is presumed to easily fit in memory. Once this presumption is shown to be incorrect, then this needs to be completely rewritten to be distributed.
Modifier and Type | Class and Description |
---|---|
protected static class |
WordCountTask.ValueStringCount
Small extension to the ValueString class to add
an atomic counter for each word.
|
Constructor and Description |
---|
WordCountTask() |
WordCountTask(int minFreq) |
Modifier and Type | Method and Description |
---|---|
protected void |
copyOver(WordCountTask lv) |
void |
map(water.fvec.Chunk[] cs)
Iterates over all chunks containing strings, and
adds unique instances of those strings to a node
local hashmap.
|
void |
postGlobal()
Once hashmap has been consolidated to single node,
filter out infrequent words, then sort array
according to frequency (descending), finally
put the results into a frame.
|
WordCountTask |
read_impl(water.AutoBuffer ab)
Automagically called as a node receives results
from another node to be reduced.
|
void |
reduce(WordCountTask that)
Local reduces should all see same HM.
|
protected void |
setupLocal() |
water.AutoBuffer |
write_impl(water.AutoBuffer ab)
Automagically called as a node sends its results
to be reduced by another node.
|
appendables, asyncExec, asyncExec, asyncExec, block, closeLocal, compute2, dfork, dfork, dfork, dfork, dinvoke, doAll, doAll, doAll, doAll, doAll, doAll, doAll, doAll, doAll, doAllNodes, getResult, isReleasable, map, map, map, map, map, map, map, map, map, map, map, map, onCompletion, onExceptionalCompletion, outputFrame, outputFrame, outputFrame, postLocal, priority, profString, self, setProfile
getDException, hasException, logVerbose, onAck, onAckAck, setException
clone, compute, frozenType, icer, nextThrPriority, read, readJSON_impl, readJSON, write, writeHTML_impl, writeHTML, writeJSON_impl, writeJSON
addToPendingCount, compareAndSetPendingCount, complete, exec, getCompleter, getPendingCount, getRawResult, setCompleter, setPendingCount, setRawResult, tryComplete
adapt, adapt, adapt, cancel, compareAndSetForkJoinTaskTag, completeExceptionally, fork, get, get, getException, getForkJoinTaskTag, getPool, getQueuedTaskCount, getSurplusQueuedTaskCount, helpQuiesce, inForkJoinPool, invoke, invokeAll, invokeAll, invokeAll, isCancelled, isCompletedAbnormally, isCompletedNormally, isDone, join, peekNextLocalTask, pollNextLocalTask, pollTask, quietlyComplete, quietlyInvoke, quietlyJoin, reinitialize, setForkJoinTaskTag, tryUnfork
public WordCountTask()
public WordCountTask(int minFreq)
protected void setupLocal()
setupLocal
in class water.MRTask<WordCountTask>
public void map(water.fvec.Chunk[] cs)
map
in class water.MRTask<WordCountTask>
public void reduce(WordCountTask that)
read_impl(water.AutoBuffer)
method.reduce
in class water.MRTask<WordCountTask>
public water.AutoBuffer write_impl(water.AutoBuffer ab)
write_impl
in interface water.Freezable
write_impl
in class water.H2O.H2OCountedCompleter<WordCountTask>
public WordCountTask read_impl(water.AutoBuffer ab)
read_impl
in interface water.Freezable
read_impl
in class water.H2O.H2OCountedCompleter<WordCountTask>
protected void copyOver(WordCountTask lv)
copyOver
in class water.DTask<WordCountTask>
public void postGlobal()
postGlobal
in class water.MRTask<WordCountTask>