public abstract class ModelBuilder<M extends Model<M,P,O>,P extends Model.Parameters,O extends Model.Output> extends Iced
Modifier and Type | Class and Description |
---|---|
static class |
ModelBuilder.BuilderVisibility
Visibility for this algo: is it always visible, is it beta (always
visible but with a note in the UI) or is it experimental (hidden by
default, visible in the UI if the user gives an "experimental" flag at
startup); test-only builders are "experimental"
|
protected class |
ModelBuilder.Driver |
class |
ModelBuilder.FilterCols |
class |
ModelBuilder.ModelTrainingCoordinator |
static class |
ModelBuilder.ValidationMessage
Can be an ERROR, meaning the parameters can't be used as-is,
a TRACE, which means the specified field should be hidden given
the values of other fields, or a WARN or INFO for informative
messages to the user.
|
Modifier and Type | Field and Description |
---|---|
protected ModelBuilder.ModelTrainingCoordinator |
_coordinator |
java.lang.String |
_desc |
protected ModelTrainingEventsPublisher |
_eventPublisher |
protected Vec |
_fold |
P |
_input_parms
All the parameters required to build the model conserved in the input form, with AUTO values not evaluated yet.
|
Job<M> |
_job |
ModelBuilder.ValidationMessage[] |
_messages
A list of field validation issues.
|
protected int |
_nclass |
protected Vec |
_offset |
protected double[] |
_orig_projection_array |
protected java.lang.String[][] |
_origDomains |
protected java.lang.String[] |
_origNames |
protected Frame |
_origTrain |
P |
_parms
All the parameters required to build the model.
|
protected double[] |
_priorClassDist |
java.util.HashSet<java.lang.String> |
_removedCols |
protected Vec |
_response |
protected Key<M> |
_result |
protected boolean |
_startUpOnceModelBuilder |
protected Frame |
_train |
protected Vec |
_treatment |
protected Frame |
_valid |
protected Vec |
_vresponse |
protected Vec |
_weights |
Modifier | Constructor and Description |
---|---|
protected |
ModelBuilder(P parms)
Default easy constructor: Unique new job and unique new result key
|
protected |
ModelBuilder(P parms,
boolean startup_once)
One-time start-up only ModelBuilder, endlessly cloned by the GUI for the
default settings.
|
protected |
ModelBuilder(P parms,
boolean startup_once,
java.lang.String externalSchemaDirectory) |
protected |
ModelBuilder(P parms,
Job<M> job)
Shared pre-existing Job and unique new result key
|
protected |
ModelBuilder(P parms,
Key<M> key)
Unique new job and named result key
|
Modifier and Type | Method and Description |
---|---|
static java.lang.String |
algoName(java.lang.String urlName)
gbm -> GBM, deeplearning -> DeepLearning
|
static java.lang.String[] |
algos() |
ModelBuilder.BuilderVisibility |
builderVisibility() |
abstract hex.ModelCategory[] |
can_build()
List containing the categories of models that this builder can
build.
|
protected boolean |
canLearnFromNAs()
Indicates that the algorithm is able to natively learn from NA values, there is no need
to eg.
|
protected void |
checkCustomMetricForEarlyStopping() |
void |
checkDistributions() |
protected void |
checkEarlyStoppingReproducibility() |
protected void |
checkMemoryFootPrint_impl()
Override this method to call error() if the model is expected to not fit in memory, and say why
|
protected void |
checkMemoryFootPrint()
Makes sure the final model will fit in memory.
|
protected void |
checkResponseVariable()
Checks response variable attributes and adds errors if response variable is unusable.
|
void |
clearInitState()
Clear whatever was done by init() so it can be run again.
|
void |
clearValidationErrors() |
void |
computeCrossValidation()
Default naive (serial) implementation of N-fold cross-validation
(builds N+1 models, all have train+validation metrics, the main model has N-fold cross-validated validation metrics)
|
protected boolean |
computePriorClassDistribution() |
void |
cv_buildModels(int N,
ModelBuilder<M,P,O>[] cvModelBuilders) |
protected boolean |
cv_canBuildMainModelInParallel() |
void |
cv_computeAndSetOptimalParameters(ModelBuilder<M,P,O>[] cvModelBuilders)
Override for model-specific checks / modifications to _parms for the main model during N-fold cross-validation.
|
protected boolean |
cv_initStoppingParameters() |
void |
cv_mainModelScores(int N,
ModelMetrics.MetricBuilder[] mbs,
ModelBuilder<M,P,O>[] cvModelBuilders) |
void |
cv_makeAggregateModelMetrics(ModelMetrics.MetricBuilder[] mbs) |
ModelMetrics.MetricBuilder[] |
cv_scoreCVModels(int N,
Vec[] weights,
ModelBuilder<M,P,O>[] cvModelBuilders) |
protected boolean |
cv_updateOptimalParameters(ModelBuilder<M,P,O>[] cvModelBuilders) |
static <S extends Model> |
defaultKey(java.lang.String algoName)
Default model-builder key
|
protected int |
desiredChunks(Frame original_fr,
boolean local)
Find desired number of chunks.
|
Key<M> |
dest() |
int |
error_count() |
void |
error(java.lang.String field_name,
java.lang.String message) |
M |
get()
Block till completion, and return the built model from the DKV.
|
ModelBuilder.ValidationMessage[] |
getMessagesByFieldAndSeverity(java.lang.String fieldName,
byte logLevel) |
java.lang.String |
getName()
Overridable Model Builder name used in generated code, in case the name of the ModelBuilder class is not suitable.
|
protected java.lang.String |
getSysProperty(java.lang.String name,
java.lang.String def) |
ToEigenVec |
getToEigenVec() |
boolean |
hasFoldCol() |
boolean |
hasOffsetCol() |
boolean |
hasTreatmentCol() |
boolean |
hasWeightCol() |
boolean |
haveMojo() |
boolean |
havePojo() |
void |
hide(java.lang.String field_name,
java.lang.String message) |
protected void |
ignoreBadColumns(int npredictors,
boolean expensive)
Ignore constant columns, columns with all NAs and strings.
|
protected boolean |
ignoreConstColumns() |
protected void |
ignoreInvalidColumns(int npredictors,
boolean expensive)
Ignore invalid columns (columns that have a very high max value, which can cause issues in DHistogram)
|
protected boolean |
ignoreStringColumns() |
protected boolean |
ignoreUuidColumns() |
void |
info(java.lang.String field_name,
java.lang.String message) |
Frame |
init_adaptFrameToTrain(Frame fr,
java.lang.String frDesc,
java.lang.String field,
boolean expensive)
Adapts a given frame to the same schema as the training frame.
|
protected int |
init_getNClass() |
void |
init(boolean expensive)
Initialize the ModelBuilder, validating all arguments and preparing the
training frame.
|
protected void |
initWorkspace(boolean expensive) |
boolean |
isClassifier() |
boolean |
isResponseOptional() |
boolean |
isStopped() |
abstract boolean |
isSupervised() |
static java.lang.String |
javaName(java.lang.String urlName)
gbm -> hex.tree.gbm.GBM, deeplearning -> hex.deeplearning.DeepLearning
|
protected boolean |
logMe() |
static <B extends ModelBuilder,MP extends Model.Parameters> |
make(MP parms)
Factory method to create a ModelBuilder instance from a clone of a given
parms instance of Model.Parameters. |
static <B extends ModelBuilder,MP extends Model.Parameters> |
make(MP parms,
Key<Model> mKey) |
static <B extends ModelBuilder> |
make(java.lang.String algo,
Job job,
Key<Model> result)
Factory method to create a ModelBuilder instance for given the algo name.
|
protected boolean |
makeCVMetrics(ModelBuilder<?,?,?> cvModelBuilder) |
protected CVModelBuilder |
makeCVModelBuilder(ModelBuilder<?,?,?>[] modelBuilders,
int parallelization) |
static <P extends Model.Parameters> |
makeParameters(java.lang.String algo) |
PojoWriter |
makePojoWriter(Model<?,?,?> genericModel,
hex.genmodel.MojoModel mojoModel) |
void |
message(byte log_level,
java.lang.String field_name,
java.lang.String message) |
int |
nclasses() |
boolean |
nFoldCV() |
protected int |
nFoldWork() |
protected int |
nModelsInParallel()
Deprecated.
|
protected int |
nModelsInParallel(int folds)
How many should be trained in parallel during N-fold cross-validation?
Train all CV models in parallel when parallelism is enabled, otherwise train one at a time
Each model can override this logic, based on parameters, dataset size, etc.
|
protected int |
nModelsInParallel(int folds,
int defaultParallelization) |
int |
numSpecialCols() |
static java.lang.String |
paramName(java.lang.String urlName)
gbm -> GBMParameters
|
protected void |
raiseReproducibilityWarning(java.lang.String datasetName,
int chunks) |
protected Frame |
rebalance(Frame original_fr,
boolean local,
java.lang.String name)
Rebalance a frame for load balancing
|
protected long |
remainingTimeSecs() |
Vec |
response()
Train response vector.
|
static java.lang.String |
schemaDirectory(java.lang.String urlName)
gbm -> "hex.schemas." ; custAlgo -> "org.myOrg.schemas."
|
int |
separateFeatureVecs()
Find and set response/weights/offset/fold and put them all in the end,
|
protected void |
setMaxRuntimeSecsForMainModel()
Set max_runtime_secs for the main model.
|
void |
setTrain(Frame train) |
void |
setValid(Frame valid) |
boolean |
shouldReorder(Vec v) |
protected long |
smallDataSize() |
protected boolean |
stop_requested() |
protected boolean |
timeout() |
Frame |
train()
Training frame: derived from the parameter's training frame, excluding
all ignored columns, all constant and bad columns, perhaps flipping the
response column to an Categorical, etc.
|
Job<M> |
trainModel()
Method to launch training of a Model, based on its parameters.
|
Job<M> |
trainModel(ModelBuilderListener callback) |
protected abstract ModelBuilder.Driver |
trainModelImpl()
Model-specific implementation of model training
|
M |
trainModelNested(Frame fr)
Train a model as part of a larger Job;
|
static <MP extends Model.Parameters> |
trainModelNested(Job<?> job,
Key<Model> result,
MP params,
Frame fr)
Train a model as part of a larger job.
|
Job<M> |
trainModelOnH2ONode()
Start model training using a this ModelBuilder as a template.
|
Frame |
valid()
Validation frame: derived from the parameter's validation frame, excluding
all ignored columns, all constant and bad columns, perhaps flipping the
response column to a Categorical, etc.
|
protected boolean |
validateBinaryResponse() |
protected boolean |
validateStoppingMetric() |
java.lang.String |
validationErrors()
Get a string representation of only the ERROR ValidationMessages (e.g., to use in an exception throw).
|
java.lang.String |
validationWarnings() |
Vec |
vresponse()
Validation response vector.
|
void |
warn(java.lang.String field_name,
java.lang.String message) |
asBytes, clone, copyOver, frozenType, read, readExternal, readJSON, reloadFromBytes, toJsonBytes, toJsonString, write, writeExternal, writeJSON
public java.lang.String _desc
protected boolean _startUpOnceModelBuilder
public P extends Model.Parameters _parms
public P extends Model.Parameters _input_parms
protected transient Frame _train
protected transient Frame _origTrain
protected transient Frame _valid
protected transient ModelTrainingEventsPublisher _eventPublisher
protected transient ModelBuilder.ModelTrainingCoordinator _coordinator
protected transient Vec _response
protected transient Vec _vresponse
protected transient Vec _offset
protected transient Vec _weights
protected transient Vec _fold
protected transient Vec _treatment
protected transient java.lang.String[] _origNames
protected transient java.lang.String[][] _origDomains
protected transient double[] _orig_projection_array
protected int _nclass
protected transient double[] _priorClassDist
public ModelBuilder.ValidationMessage[] _messages
public transient java.util.HashSet<java.lang.String> _removedCols
protected ModelBuilder(P parms)
protected ModelBuilder(P parms, Job<M> job)
protected ModelBuilder(P parms, boolean startup_once)
protected ModelBuilder(P parms, boolean startup_once, java.lang.String externalSchemaDirectory)
public ToEigenVec getToEigenVec()
public boolean shouldReorder(Vec v)
public final M get()
public final boolean isStopped()
protected boolean timeout()
protected boolean stop_requested()
protected long remainingTimeSecs()
public static <S extends Model> Key<S> defaultKey(java.lang.String algoName)
public static java.lang.String[] algos()
public static java.lang.String algoName(java.lang.String urlName)
public static java.lang.String javaName(java.lang.String urlName)
public static java.lang.String paramName(java.lang.String urlName)
public static java.lang.String schemaDirectory(java.lang.String urlName)
public static <P extends Model.Parameters> P makeParameters(java.lang.String algo)
public static <B extends ModelBuilder> B make(java.lang.String algo, Job job, Key<Model> result)
public static <B extends ModelBuilder,MP extends Model.Parameters> B make(MP parms)
parms
instance of Model.Parameters.public static <B extends ModelBuilder,MP extends Model.Parameters> B make(MP parms, Key<Model> mKey)
public final Frame train()
public void setTrain(Frame train)
public void setValid(Frame valid)
public final Frame valid()
public Vec response()
public Vec vresponse()
public Job<M> trainModelOnH2ONode()
public final Job<M> trainModel()
public final Job<M> trainModel(ModelBuilderListener callback)
public final M trainModelNested(Frame fr)
fr:
- Input frame override, ignored if null.
In some cases, algos do not work directly with the original frame in the K/V store.
Instead they run on a private anonymous copy (eg: reblanced dataset).
Use this argument if you want nested job to work on the actual working copy rather than the original Frame in the K/V.
Example: Outer job rebalances dataset and then calls nested job. To avoid needless second reblance, pass in the (already rebalanced) working copy.public static <MP extends Model.Parameters> Model trainModelNested(Job<?> job, Key<Model> result, MP params, Frame fr)
MP
- Model.Parametersjob
- containing jobresult
- key of the resulting modelparams
- model parametersfr
- input frame, ignored if nullprotected abstract ModelBuilder.Driver trainModelImpl()
@Deprecated protected int nModelsInParallel()
protected int nModelsInParallel(int folds)
protected int nModelsInParallel(int folds, int defaultParallelization)
protected long smallDataSize()
protected int nFoldWork()
public void computeCrossValidation()
public void cv_buildModels(int N, ModelBuilder<M,P,O>[] cvModelBuilders)
protected CVModelBuilder makeCVModelBuilder(ModelBuilder<?,?,?>[] modelBuilders, int parallelization)
public ModelMetrics.MetricBuilder[] cv_scoreCVModels(int N, Vec[] weights, ModelBuilder<M,P,O>[] cvModelBuilders)
protected boolean makeCVMetrics(ModelBuilder<?,?,?> cvModelBuilder)
protected boolean cv_canBuildMainModelInParallel()
protected boolean cv_updateOptimalParameters(ModelBuilder<M,P,O>[] cvModelBuilders)
protected boolean cv_initStoppingParameters()
public void cv_mainModelScores(int N, ModelMetrics.MetricBuilder[] mbs, ModelBuilder<M,P,O>[] cvModelBuilders)
public void cv_makeAggregateModelMetrics(ModelMetrics.MetricBuilder[] mbs)
protected void setMaxRuntimeSecsForMainModel()
public void cv_computeAndSetOptimalParameters(ModelBuilder<M,P,O>[] cvModelBuilders)
public boolean nFoldCV()
public abstract hex.ModelCategory[] can_build()
public ModelBuilder.BuilderVisibility builderVisibility()
public void clearInitState()
protected boolean logMe()
public abstract boolean isSupervised()
public boolean isResponseOptional()
public boolean hasOffsetCol()
public boolean hasWeightCol()
public boolean hasFoldCol()
public boolean hasTreatmentCol()
public int numSpecialCols()
public boolean havePojo()
public boolean haveMojo()
public int nclasses()
public final boolean isClassifier()
protected boolean validateStoppingMetric()
protected boolean validateBinaryResponse()
protected void checkEarlyStoppingReproducibility()
public int separateFeatureVecs()
protected boolean ignoreStringColumns()
protected boolean ignoreConstColumns()
protected boolean ignoreUuidColumns()
protected void ignoreBadColumns(int npredictors, boolean expensive)
npredictors
- expensive
- protected boolean canLearnFromNAs()
protected void checkResponseVariable()
protected void ignoreInvalidColumns(int npredictors, boolean expensive)
npredictors
- expensive
- protected void checkMemoryFootPrint()
protected void checkMemoryFootPrint_impl()
protected boolean computePriorClassDistribution()
public int error_count()
public void hide(java.lang.String field_name, java.lang.String message)
public void info(java.lang.String field_name, java.lang.String message)
public void warn(java.lang.String field_name, java.lang.String message)
public void error(java.lang.String field_name, java.lang.String message)
public void clearValidationErrors()
public void message(byte log_level, java.lang.String field_name, java.lang.String message)
public ModelBuilder.ValidationMessage[] getMessagesByFieldAndSeverity(java.lang.String fieldName, byte logLevel)
public java.lang.String validationErrors()
public java.lang.String validationWarnings()
public void init(boolean expensive)
expensive
is false; it will be called once again at the start of
model building trainModel()
with expensive set to true.
The incoming training frame (and validation frame) will have ignored columns dropped out, plus whatever work the parent init did.
NOTE: The front end initially calls this through the parameters validation
endpoint with no training_frame, so each subclass's init()
method
has to work correctly with the training_frame missing.
protected void checkCustomMetricForEarlyStopping()
public Frame init_adaptFrameToTrain(Frame fr, java.lang.String frDesc, java.lang.String field, boolean expensive)
fr
- input framefrDesc
- frame description, eg. "Validation Frame" - will be shown in validation error messagesfield
- name of a field for validation errorsexpensive
- indicates full ("expensive") processingprotected Frame rebalance(Frame original_fr, boolean local, java.lang.String name)
original_fr
- Input framelocal
- Whether to only create enough chunks to max out all cores on one node only
WARNING: This behavior is not actually implemented in the methods defined in this class, the default logic
doesn't take this parameter into consideration.name
- Name of rebalanced frameprotected void raiseReproducibilityWarning(java.lang.String datasetName, int chunks)
protected int desiredChunks(Frame original_fr, boolean local)
protected java.lang.String getSysProperty(java.lang.String name, java.lang.String def)
protected int init_getNClass()
public void checkDistributions()
public java.lang.String getName()
protected final void initWorkspace(boolean expensive)
public PojoWriter makePojoWriter(Model<?,?,?> genericModel, hex.genmodel.MojoModel mojoModel)