public abstract class ModelBuilder<M extends Model<M,P,O>,P extends Model.Parameters,O extends Model.Output> extends Iced
Modifier and Type | Class and Description |
---|---|
static class |
ModelBuilder.BuilderVisibility
Visibility for this algo: is it always visible, is it beta (always
visible but with a note in the UI) or is it experimental (hidden by
default, visible in the UI if the user gives an "experimental" flag at
startup); test-only builders are "experimental"
|
protected class |
ModelBuilder.Driver |
static class |
ModelBuilder.ValidationMessage
Can be an ERROR, meaning the parameters can't be used as-is,
a TRACE, which means the specified field should be hidden given
the values of other fields, or a WARN or INFO for informative
messages to the user.
|
Modifier and Type | Field and Description |
---|---|
protected Vec |
_fold |
Job |
_job |
ModelBuilder.ValidationMessage[] |
_messages
A list of field validation issues.
|
protected int |
_nclass |
protected Vec |
_offset |
protected java.lang.String[][] |
_origDomains |
protected java.lang.String[] |
_origNames |
P |
_parms
All the parameters required to build the model.
|
protected double[] |
_priorClassDist |
java.util.HashSet<java.lang.String> |
_removedCols |
protected Vec |
_response |
protected Key<M> |
_result |
protected Frame |
_train |
protected Frame |
_valid |
protected Vec |
_vresponse |
protected Vec |
_weights |
Modifier | Constructor and Description |
---|---|
protected |
ModelBuilder(P parms)
Default easy constructor: Unique new job and unique new result key
|
protected |
ModelBuilder(P parms,
boolean startup_once)
One-time start-up only ModelBuilder, endlessly cloned by the GUI for the
default settings.
|
protected |
ModelBuilder(P parms,
boolean startup_once,
java.lang.String externalSchemaDirectory) |
protected |
ModelBuilder(P parms,
Job job)
Shared pre-existing Job and unique new result key
|
protected |
ModelBuilder(P parms,
Key<M> key)
Unique new job and named result key
|
Modifier and Type | Method and Description |
---|---|
static java.lang.String |
algoName(java.lang.String urlName)
gbm -> GBM, deeplearning -> DeepLearning
|
static java.lang.String[] |
algos() |
ModelBuilder.BuilderVisibility |
builderVisibility() |
abstract hex.ModelCategory[] |
can_build()
List containing the categories of models that this builder can
build.
|
void |
checkDistributions() |
protected void |
checkMemoryFootPrint()
Override this method to call error() if the model is expected to not fit in memory, and say why
|
void |
clearInitState()
Clear whatever was done by init() so it can be run again.
|
void |
clearValidationErrors() |
void |
computeCrossValidation()
Default naive (serial) implementation of N-fold cross-validation
|
protected boolean |
computePriorClassDistribution() |
Vec |
cv_AssignFold(int N) |
H2O.H2OCountedCompleter |
cv_buildModels(int N,
ModelBuilder<M,P,O>[] cvModelBuilders) |
void |
cv_mainModelScores(int N,
ModelMetrics.MetricBuilder[] mbs,
ModelBuilder<M,P,O>[] cvModelBuilders) |
ModelBuilder<M,P,O>[] |
cv_makeFramesAndBuilders(int N,
Vec[] weights) |
Vec[] |
cv_makeWeights(int N,
Vec foldAssignment) |
ModelMetrics.MetricBuilder[] |
cv_scoreCVModels(int N,
Vec[] weights,
ModelBuilder<M,P,O>[] cvModelBuilders) |
static Key<? extends Model> |
defaultKey(java.lang.String algoName)
Default model-builder key
|
protected int |
desiredChunks(Frame original_fr,
boolean local)
Find desired number of chunks.
|
Key<M> |
dest() |
int |
error_count() |
void |
error(java.lang.String field_name,
java.lang.String message) |
M |
get()
Block till completion, and return the built model from the DKV.
|
ToEigenVec |
getToEigenVec() |
boolean |
hasFoldCol() |
boolean |
hasOffsetCol() |
boolean |
hasWeightCol() |
void |
hide(java.lang.String field_name,
java.lang.String message) |
protected void |
ignoreBadColumns(int npredictors,
boolean expensive)
Ignore constant columns, columns with all NAs and strings.
|
protected boolean |
ignoreConstColumns() |
protected boolean |
ignoreStringColumns() |
void |
info(java.lang.String field_name,
java.lang.String message) |
void |
init(boolean expensive)
Initialize the ModelBuilder, validating all arguments and preparing the
training frame.
|
boolean |
isClassifier() |
boolean |
isStopped() |
boolean |
isSupervised() |
static java.lang.String |
javaName(java.lang.String urlName)
gbm -> hex.tree.gbm.GBM, deeplearning -> hex.deeplearning.DeepLearning
|
protected boolean |
logMe() |
static <B extends ModelBuilder> |
make(java.lang.String algo,
Job job,
Key<Model> result)
Factory method to create a ModelBuilder instance for given the algo name.
|
void |
message(byte log_level,
java.lang.String field_name,
java.lang.String message) |
void |
modifyParmsForCrossValidationMainModel(ModelBuilder<M,P,O>[] cvModelBuilders)
Override for model-specific checks / modifications to _parms for the main model during N-fold cross-validation.
|
int |
nclasses() |
boolean |
nFoldCV() |
protected int |
nFoldWork() |
protected int |
nModelsInParallel()
How many should be trained in parallel during N-fold cross-validation?
Train all CV models in parallel when parallelism is enabled, otherwise train one at a time
Each model can override this logic, based on parameters, dataset size, etc.
|
int |
numSpecialCols() |
static java.lang.String |
paramName(java.lang.String urlName)
gbm -> GBMParameters
|
protected Frame |
rebalance(Frame original_fr,
boolean local,
java.lang.String name)
Rebalance a frame for load balancing
|
Vec |
response()
Train response vector.
|
static java.lang.String |
schemaDirectory(java.lang.String urlName)
gbm -> "hex.schemas." ; custAlgo -> "org.myOrg.schemas."
|
protected int |
separateFeatureVecs()
Find and set response/weights/offset/fold and put them all in the end,
|
java.lang.String[] |
specialColNames() |
protected boolean |
stop_requested() |
protected boolean |
timeout() |
Frame |
train()
Training frame: derived from the parameter's training frame, excluding
all ignored columns, all constant and bad columns, perhaps flipping the
response column to an Categorical, etc.
|
Job<M> |
trainModel()
Method to launch training of a Model, based on its parameters.
|
protected abstract ModelBuilder.Driver |
trainModelImpl()
Model-specific implementation of model training
|
M |
trainModelNested()
Train a model as part of a larger Job; the Job already exists and has started.
|
protected Frame |
valid()
Validation frame: derived from the parameter's validation frame, excluding
all ignored columns, all constant and bad columns, perhaps flipping the
response column to a Categorical, etc.
|
java.lang.String |
validationErrors()
Get a string representation of only the ERROR ValidationMessages (e.g., to use in an exception throw).
|
Vec |
vresponse()
Validation response vector.
|
void |
warn(java.lang.String field_name,
java.lang.String message) |
asBytes, clone, copyOver, frozenType, read, readExternal, readJSON, reloadFromBytes, toJsonString, write, writeExternal, writeJSON
public Job _job
public P extends Model.Parameters _parms
protected transient Frame _train
protected transient Frame _valid
protected transient Vec _response
protected transient Vec _vresponse
protected transient Vec _offset
protected transient Vec _weights
protected transient Vec _fold
protected transient java.lang.String[] _origNames
protected transient java.lang.String[][] _origDomains
protected int _nclass
protected transient double[] _priorClassDist
public ModelBuilder.ValidationMessage[] _messages
public transient java.util.HashSet<java.lang.String> _removedCols
protected ModelBuilder(P parms)
protected ModelBuilder(P parms, Job job)
protected ModelBuilder(P parms, boolean startup_once)
protected ModelBuilder(P parms, boolean startup_once, java.lang.String externalSchemaDirectory)
public ToEigenVec getToEigenVec()
public final M get()
public final boolean isStopped()
protected boolean timeout()
protected boolean stop_requested()
public static Key<? extends Model> defaultKey(java.lang.String algoName)
public static java.lang.String[] algos()
public static java.lang.String algoName(java.lang.String urlName)
public static java.lang.String javaName(java.lang.String urlName)
public static java.lang.String paramName(java.lang.String urlName)
public static java.lang.String schemaDirectory(java.lang.String urlName)
public static <B extends ModelBuilder> B make(java.lang.String algo, Job job, Key<Model> result)
public final Frame train()
protected final Frame valid()
public Vec response()
public Vec vresponse()
public final Job<M> trainModel()
public final M trainModelNested()
protected abstract ModelBuilder.Driver trainModelImpl()
protected int nModelsInParallel()
protected int nFoldWork()
public void computeCrossValidation()
public Vec cv_AssignFold(int N)
public ModelBuilder<M,P,O>[] cv_makeFramesAndBuilders(int N, Vec[] weights)
public H2O.H2OCountedCompleter cv_buildModels(int N, ModelBuilder<M,P,O>[] cvModelBuilders)
public ModelMetrics.MetricBuilder[] cv_scoreCVModels(int N, Vec[] weights, ModelBuilder<M,P,O>[] cvModelBuilders)
public void cv_mainModelScores(int N, ModelMetrics.MetricBuilder[] mbs, ModelBuilder<M,P,O>[] cvModelBuilders)
public void modifyParmsForCrossValidationMainModel(ModelBuilder<M,P,O>[] cvModelBuilders)
public boolean nFoldCV()
public abstract hex.ModelCategory[] can_build()
public ModelBuilder.BuilderVisibility builderVisibility()
public void clearInitState()
protected boolean logMe()
public boolean isSupervised()
public boolean hasOffsetCol()
public boolean hasWeightCol()
public boolean hasFoldCol()
public int numSpecialCols()
public java.lang.String[] specialColNames()
public int nclasses()
public final boolean isClassifier()
protected int separateFeatureVecs()
protected boolean ignoreStringColumns()
protected boolean ignoreConstColumns()
protected void ignoreBadColumns(int npredictors, boolean expensive)
npredictors
- expensive
- protected void checkMemoryFootPrint()
protected boolean computePriorClassDistribution()
public int error_count()
public void hide(java.lang.String field_name, java.lang.String message)
public void info(java.lang.String field_name, java.lang.String message)
public void warn(java.lang.String field_name, java.lang.String message)
public void error(java.lang.String field_name, java.lang.String message)
public void clearValidationErrors()
public void message(byte log_level, java.lang.String field_name, java.lang.String message)
public java.lang.String validationErrors()
public void init(boolean expensive)
expensive
is false; it will be called once again at the start of
model building trainModel()
with expensive set to true.
The incoming training frame (and validation frame) will have ignored columns dropped out, plus whatever work the parent init did.
NOTE: The front end initially calls this through the parameters validation
endpoint with no training_frame, so each subclass's init()
method
has to work correctly with the training_frame missing.
protected Frame rebalance(Frame original_fr, boolean local, java.lang.String name)
original_fr
- Input framelocal
- Whether to only create enough chunks to max out all cores on one node onlyname
- Name of rebalanced frameprotected int desiredChunks(Frame original_fr, boolean local)
public void checkDistributions()