public class RegexTokenizer extends MRTask<RegexTokenizer>
Example usage:
final RegexTokenizer tokenizer = new RegexTokenizer.Builder()
.setRegex("[,;]")
.setMinLength(2)
.setToLowercase(true)
.create();
final Frame tokens = tokenizer.transform(inputFrame);
Modifier and Type | Class and Description |
---|---|
static class |
RegexTokenizer.Builder |
MRTask.PostMapAction<T extends MRTask.PostMapAction<T>>
DTask.DKeyTask<T extends DTask.DKeyTask,V extends Keyed>, DTask.RemoveCall
Constructor and Description |
---|
RegexTokenizer(java.lang.String regex) |
Modifier and Type | Method and Description |
---|---|
void |
map(Chunk[] cs,
NewChunk nc)
The handy method to generate a new vector based on existing vectors.
|
Frame |
transform(Frame input)
Tokenizes a given Frame
|
appendables, asyncExecOnAllNodes, block, closeLocal, compute2, dfork, dfork, dfork, dfork, dfork, dinvoke, doAll, doAll, doAll, doAll, doAll, doAll, doAll, doAll, doAll, doAll, doAll, doAll, doAllNodes, getResult, getResult, isReleasable, map, map, map, map, map, map, map, map, map, map, map, modifiesVolatileVecs, onCompletion, onExceptionalCompletion, outputFrame, outputFrame, outputFrame, postGlobal, profile, profString, reduce, self, setupLocal, withPostMapAction
copyOver, getDException, hasException, logVerbose, onAck, onAckAck, setException
asBytes, clone, compute, compute1, currThrPriority, frozenType, icer, priority, read, readJSON, reloadFromBytes, write, writeJSON
__tryComplete, addToPendingCount, compareAndSetPendingCount, complete, exec, getCompleter, getPendingCount, getRawResult, setCompleter, setPendingCount, setRawResult, tryComplete
adapt, adapt, adapt, cancel, compareAndSetForkJoinTaskTag, completeExceptionally, fork, get, get, get, getException, getForkJoinTaskTag, getPool, getQueuedTaskCount, getSurplusQueuedTaskCount, helpQuiesce, inForkJoinPool, invoke, invokeAll, invokeAll, invokeAll, isCancelled, isCompletedAbnormally, isCompletedNormally, isDone, join, peekNextLocalTask, pollNextLocalTask, pollTask, quietlyComplete, quietlyInvoke, quietlyJoin, reinitialize, setForkJoinTaskTag, tryUnfork
public void map(Chunk[] cs, NewChunk nc)
MRTask
map
in class MRTask<RegexTokenizer>
cs
- input vectorsnc
- output vectorpublic Frame transform(Frame input)
input
- Input Frame is expected to only contain String columns. Each row of the Frame represents a logical
sentence. The sentence can span one or more cells of the row.