public abstract class ModelBuilder<M extends Model<M,P,O>,P extends Model.Parameters,O extends Model.Output> extends Iced
Modifier and Type | Class and Description |
---|---|
static class |
ModelBuilder.BuilderVisibility
Visibility for this algo: is it always visible, is it beta (always
visible but with a note in the UI) or is it experimental (hidden by
default, visible in the UI if the user gives an "experimental" flag at
startup); test-only builders are "experimental"
|
protected class |
ModelBuilder.Driver |
class |
ModelBuilder.FilterCols |
static class |
ModelBuilder.ValidationMessage
Can be an ERROR, meaning the parameters can't be used as-is,
a TRACE, which means the specified field should be hidden given
the values of other fields, or a WARN or INFO for informative
messages to the user.
|
Modifier and Type | Field and Description |
---|---|
protected Vec |
_fold |
Job<M> |
_job |
ModelBuilder.ValidationMessage[] |
_messages
A list of field validation issues.
|
protected int |
_nclass |
protected Vec |
_offset |
protected java.lang.String[][] |
_origDomains |
protected java.lang.String[] |
_origNames |
P |
_parms
All the parameters required to build the model.
|
protected double[] |
_priorClassDist |
java.util.HashSet<java.lang.String> |
_removedCols |
protected Vec |
_response |
protected Key<M> |
_result |
protected Frame |
_train |
protected Frame |
_valid |
protected Vec |
_vresponse |
protected Vec |
_weights |
Modifier | Constructor and Description |
---|---|
protected |
ModelBuilder(P parms)
Default easy constructor: Unique new job and unique new result key
|
protected |
ModelBuilder(P parms,
boolean startup_once)
One-time start-up only ModelBuilder, endlessly cloned by the GUI for the
default settings.
|
protected |
ModelBuilder(P parms,
boolean startup_once,
java.lang.String externalSchemaDirectory) |
protected |
ModelBuilder(P parms,
Job<M> job)
Shared pre-existing Job and unique new result key
|
protected |
ModelBuilder(P parms,
Key<M> key)
Unique new job and named result key
|
Modifier and Type | Method and Description |
---|---|
static java.lang.String |
algoName(java.lang.String urlName)
gbm -> GBM, deeplearning -> DeepLearning
|
static java.lang.String[] |
algos() |
ModelBuilder.BuilderVisibility |
builderVisibility() |
abstract hex.ModelCategory[] |
can_build()
List containing the categories of models that this builder can
build.
|
void |
checkDistributions() |
protected void |
checkMemoryFootPrint()
Override this method to call error() if the model is expected to not fit in memory, and say why
|
void |
clearInitState()
Clear whatever was done by init() so it can be run again.
|
void |
clearValidationErrors() |
void |
computeCrossValidation()
Default naive (serial) implementation of N-fold cross-validation
(builds N+1 models, all have train+validation metrics, the main model has N-fold cross-validated validation metrics)
|
protected boolean |
computePriorClassDistribution() |
Vec |
cv_AssignFold(int N) |
void |
cv_buildModels(int N,
ModelBuilder<M,P,O>[] cvModelBuilders) |
void |
cv_computeAndSetOptimalParameters(ModelBuilder<M,P,O>[] cvModelBuilders)
Override for model-specific checks / modifications to _parms for the main model during N-fold cross-validation.
|
void |
cv_mainModelScores(int N,
ModelMetrics.MetricBuilder[] mbs,
ModelBuilder<M,P,O>[] cvModelBuilders) |
ModelBuilder<M,P,O>[] |
cv_makeFramesAndBuilders(int N,
Vec[] weights) |
Vec[] |
cv_makeWeights(int N,
Vec foldAssignment) |
ModelMetrics.MetricBuilder[] |
cv_scoreCVModels(int N,
Vec[] weights,
ModelBuilder<M,P,O>[] cvModelBuilders) |
static <S extends Model> |
defaultKey(java.lang.String algoName)
Default model-builder key
|
protected int |
desiredChunks(Frame original_fr,
boolean local)
Find desired number of chunks.
|
Key<M> |
dest() |
int |
error_count() |
void |
error(java.lang.String field_name,
java.lang.String message) |
M |
get()
Block till completion, and return the built model from the DKV.
|
ToEigenVec |
getToEigenVec() |
boolean |
hasFoldCol() |
boolean |
hasOffsetCol() |
boolean |
hasWeightCol() |
boolean |
haveMojo() |
boolean |
havePojo() |
void |
hide(java.lang.String field_name,
java.lang.String message) |
protected void |
ignoreBadColumns(int npredictors,
boolean expensive)
Ignore constant columns, columns with all NAs and strings.
|
protected boolean |
ignoreConstColumns() |
protected void |
ignoreInvalidColumns(int npredictors,
boolean expensive)
Ignore invalid columns (columns that have a very high max value, which can cause issues in DHistogram)
|
protected boolean |
ignoreStringColumns() |
void |
info(java.lang.String field_name,
java.lang.String message) |
protected Frame |
init_adaptFrameToTrain(Frame fr,
java.lang.String frDesc,
java.lang.String field,
boolean expensive)
Adapts a given frame to the same schema as the training frame.
|
void |
init(boolean expensive)
Initialize the ModelBuilder, validating all arguments and preparing the
training frame.
|
boolean |
isClassifier() |
boolean |
isStopped() |
abstract boolean |
isSupervised() |
static java.lang.String |
javaName(java.lang.String urlName)
gbm -> hex.tree.gbm.GBM, deeplearning -> hex.deeplearning.DeepLearning
|
protected boolean |
logMe() |
static <B extends ModelBuilder> |
make(java.lang.String algo,
Job job,
Key<Model> result)
Factory method to create a ModelBuilder instance for given the algo name.
|
void |
message(byte log_level,
java.lang.String field_name,
java.lang.String message) |
int |
nclasses() |
boolean |
nFoldCV() |
protected int |
nFoldWork() |
protected int |
nModelsInParallel()
How many should be trained in parallel during N-fold cross-validation?
Train all CV models in parallel when parallelism is enabled, otherwise train one at a time
Each model can override this logic, based on parameters, dataset size, etc.
|
int |
numSpecialCols() |
static java.lang.String |
paramName(java.lang.String urlName)
gbm -> GBMParameters
|
protected Frame |
rebalance(Frame original_fr,
boolean local,
java.lang.String name)
Rebalance a frame for load balancing
|
Vec |
response()
Train response vector.
|
static java.lang.String |
schemaDirectory(java.lang.String urlName)
gbm -> "hex.schemas." ; custAlgo -> "org.myOrg.schemas."
|
int |
separateFeatureVecs()
Find and set response/weights/offset/fold and put them all in the end,
|
void |
setTrain(Frame train) |
boolean |
shouldReorder(Vec v) |
java.lang.String[] |
specialColNames() |
protected boolean |
stop_requested() |
protected boolean |
timeout() |
Frame |
train()
Training frame: derived from the parameter's training frame, excluding
all ignored columns, all constant and bad columns, perhaps flipping the
response column to an Categorical, etc.
|
Job<M> |
trainModel()
Method to launch training of a Model, based on its parameters.
|
protected abstract ModelBuilder.Driver |
trainModelImpl()
Model-specific implementation of model training
|
M |
trainModelNested(Frame fr)
Train a model as part of a larger Job;
|
protected Frame |
valid()
Validation frame: derived from the parameter's validation frame, excluding
all ignored columns, all constant and bad columns, perhaps flipping the
response column to a Categorical, etc.
|
java.lang.String |
validationErrors()
Get a string representation of only the ERROR ValidationMessages (e.g., to use in an exception throw).
|
Vec |
vresponse()
Validation response vector.
|
void |
warn(java.lang.String field_name,
java.lang.String message) |
asBytes, clone, copyOver, frozenType, read, readExternal, readJSON, reloadFromBytes, toJsonString, write, writeExternal, writeJSON
public P extends Model.Parameters _parms
protected transient Frame _train
protected transient Frame _valid
protected transient Vec _response
protected transient Vec _vresponse
protected transient Vec _offset
protected transient Vec _weights
protected transient Vec _fold
protected transient java.lang.String[] _origNames
protected transient java.lang.String[][] _origDomains
protected int _nclass
protected transient double[] _priorClassDist
public ModelBuilder.ValidationMessage[] _messages
public transient java.util.HashSet<java.lang.String> _removedCols
protected ModelBuilder(P parms)
protected ModelBuilder(P parms, Job<M> job)
protected ModelBuilder(P parms, boolean startup_once)
protected ModelBuilder(P parms, boolean startup_once, java.lang.String externalSchemaDirectory)
public ToEigenVec getToEigenVec()
public boolean shouldReorder(Vec v)
public final M get()
public final boolean isStopped()
protected boolean timeout()
protected boolean stop_requested()
public static <S extends Model> Key<S> defaultKey(java.lang.String algoName)
public static java.lang.String[] algos()
public static java.lang.String algoName(java.lang.String urlName)
public static java.lang.String javaName(java.lang.String urlName)
public static java.lang.String paramName(java.lang.String urlName)
public static java.lang.String schemaDirectory(java.lang.String urlName)
public static <B extends ModelBuilder> B make(java.lang.String algo, Job job, Key<Model> result)
public final Frame train()
public void setTrain(Frame train)
protected final Frame valid()
public Vec response()
public Vec vresponse()
public final Job<M> trainModel()
public final M trainModelNested(Frame fr)
fr:
- Input frame override, ignored if null.
In some cases, algos do not work directly with the original frame in the K/V store.
Instead they run on a private anonymous copy (eg: reblanced dataset).
Use this argument if you want nested job to work on the actual working copy rather than the original Frame in the K/V.
Example: Outer job rebalances dataset and then calls nested job. To avoid needless second reblance, pass in the (already rebalanced) working copy.protected abstract ModelBuilder.Driver trainModelImpl()
protected int nModelsInParallel()
protected int nFoldWork()
public void computeCrossValidation()
public Vec cv_AssignFold(int N)
public ModelBuilder<M,P,O>[] cv_makeFramesAndBuilders(int N, Vec[] weights)
public void cv_buildModels(int N, ModelBuilder<M,P,O>[] cvModelBuilders)
public ModelMetrics.MetricBuilder[] cv_scoreCVModels(int N, Vec[] weights, ModelBuilder<M,P,O>[] cvModelBuilders)
public void cv_mainModelScores(int N, ModelMetrics.MetricBuilder[] mbs, ModelBuilder<M,P,O>[] cvModelBuilders)
public void cv_computeAndSetOptimalParameters(ModelBuilder<M,P,O>[] cvModelBuilders)
public boolean nFoldCV()
public abstract hex.ModelCategory[] can_build()
public ModelBuilder.BuilderVisibility builderVisibility()
public void clearInitState()
protected boolean logMe()
public abstract boolean isSupervised()
public boolean hasOffsetCol()
public boolean hasWeightCol()
public boolean hasFoldCol()
public int numSpecialCols()
public java.lang.String[] specialColNames()
public boolean havePojo()
public boolean haveMojo()
public int nclasses()
public final boolean isClassifier()
public int separateFeatureVecs()
protected boolean ignoreStringColumns()
protected boolean ignoreConstColumns()
protected void ignoreBadColumns(int npredictors, boolean expensive)
npredictors
- expensive
- protected void ignoreInvalidColumns(int npredictors, boolean expensive)
npredictors
- expensive
- protected void checkMemoryFootPrint()
protected boolean computePriorClassDistribution()
public int error_count()
public void hide(java.lang.String field_name, java.lang.String message)
public void info(java.lang.String field_name, java.lang.String message)
public void warn(java.lang.String field_name, java.lang.String message)
public void error(java.lang.String field_name, java.lang.String message)
public void clearValidationErrors()
public void message(byte log_level, java.lang.String field_name, java.lang.String message)
public java.lang.String validationErrors()
public void init(boolean expensive)
expensive
is false; it will be called once again at the start of
model building trainModel()
with expensive set to true.
The incoming training frame (and validation frame) will have ignored columns dropped out, plus whatever work the parent init did.
NOTE: The front end initially calls this through the parameters validation
endpoint with no training_frame, so each subclass's init()
method
has to work correctly with the training_frame missing.
protected Frame init_adaptFrameToTrain(Frame fr, java.lang.String frDesc, java.lang.String field, boolean expensive)
fr
- input framefrDesc
- frame description, eg. "Validation Frame" - will be shown in validation error messagesfield
- name of a field for validation errorsexpensive
- indicates full ("expensive") processingprotected Frame rebalance(Frame original_fr, boolean local, java.lang.String name)
original_fr
- Input framelocal
- Whether to only create enough chunks to max out all cores on one node onlyname
- Name of rebalanced frameprotected int desiredChunks(Frame original_fr, boolean local)
public void checkDistributions()