public abstract class Model extends Lockable<Model>
Modifier and Type | Class and Description |
---|---|
protected static class |
Model.ModelAutobufferSerializer
Helper type for serialization
|
static class |
Model.ModelCategory |
Modifier and Type | Field and Description |
---|---|
Key |
_dataKey
Dataset key used to *build* the model, for models for which this makes
sense, or null otherwise.
|
java.lang.String[][] |
_domains
Categorical/factor/enum mappings, per column.
|
protected boolean |
_have_cv_results
Whether or not this model has cross-validated results stored.
|
protected float[] |
_modelClassDist |
java.lang.String[] |
_names
Columns used in the model and are used to match up with scoring data
columns.
|
float[] |
_priorClassDist |
static DocGen.FieldDoc[] |
DOC_FIELDS |
protected static boolean |
GEN_BENCHMARK_CODE
Debug flag to generate benchmar code
|
long |
training_duration_in_ms
The duration in mS for model training.
|
long |
training_start_time
The start time in mS since the epoch for model training.
|
java.lang.String[] |
warnings
Any warnings thrown during model building.
|
Constructor and Description |
---|
Model(Key selfKey,
Key dataKey,
Frame fr,
float[] priorClassDist)
Full constructor from frame: Strips out the Vecs to just the names needed
to match columns later for future datasets.
|
Model(Key selfKey,
Key dataKey,
java.lang.String[] names,
java.lang.String[][] domains,
float[] priorClassDist,
float[] modelClassDist) |
Model(Key selfKey,
Key dataKey,
java.lang.String[] names,
java.lang.String[][] domains,
float[] priorClassDist,
float[] modelClassDist,
long training_start_time,
long training_duration_in_ms)
Full constructor
|
Modifier and Type | Method and Description |
---|---|
Frame[] |
adapt(Frame fr,
boolean exact)
Build an adapted Frame from the given Frame.
|
Frame[] |
adapt(Frame fr,
boolean exact,
boolean haveResponse) |
void |
addWarning(java.lang.String warning) |
double |
calcError(Frame ftest,
Vec vactual,
Frame fpreds,
Frame hitratio_fpreds,
java.lang.String label,
boolean printMe,
int max_conf_mat_size,
ConfusionMatrix cm,
AUC auc,
HitRatio hr)
Compute the model error for a given test data set
For multi-class classification, this is the classification error based on assigning labels for the highest predicted per-class probability.
|
java.lang.String[] |
classNames() |
ConfusionMatrix |
cm()
For classifiers, confusion matrix on validation set.
|
Futures |
delete_impl(Futures fs)
Remove any Model internal Keys
|
java.lang.String |
errStr() |
Request2 |
get_params() |
static int[][] |
getDomainMapping(java.lang.String[] modelDom,
java.lang.String[] colDom,
boolean exact)
Returns a mapping between values of model domains (
modelDom ) and given column domain. |
static int[][] |
getDomainMapping(java.lang.String colName,
java.lang.String[] modelDom,
java.lang.String[] colDom,
boolean logNonExactMapping)
Returns a mapping for given column according to given
modelDom . |
Model.ModelCategory |
getModelCategory() |
AutoBufferSerializer<Model> |
getModelSerializer()
Returns a model serializer into AutoBuffer.
|
UniqueId |
getUniqueId() |
boolean |
hasCrossValModels() |
boolean |
isClassifier() |
boolean |
isSupervised() |
Request2 |
job() |
protected double |
missingColumnsType()
Type of missing columns during adaptation between train/test datasets
Overload this method for models that have sparse data handling.
|
double |
mse()
Returns mse for validation set.
|
int |
nclasses() |
int |
nfeatures()
Returns number of input features
|
protected void |
printCrossValidationModelsHTML(java.lang.StringBuilder sb) |
java.lang.String |
responseName() |
double |
score(double[] data) |
Frame |
score(Frame fr)
Bulk score for given
fr frame. |
Frame |
score(Frame fr,
boolean adapt)
Bulk score the frame
fr , producing a Frame result; the 1st Vec is the
predicted class, the remaining Vecs are the probability distributions. |
float[] |
score(Frame fr,
boolean exact,
int row)
Single row scoring, on a compatible Frame.
|
float[] |
score(int[][][] map,
double[] row,
float[] preds)
Single row scoring, on a compatible set of data, given an adaption vector
|
float[] |
score(java.lang.String[] names,
java.lang.String[][] domains,
boolean exact,
double[] row)
Single row scoring, on a compatible set of data.
|
protected float[] |
score0(Chunk[] chks,
int row_in_chunk,
double[] tmp,
float[] preds)
Bulk scoring API for one row.
|
protected abstract float[] |
score0(double[] data,
float[] preds)
Subclasses implement the scoring logic.
|
void |
scoreCrossValidation(Job.ValidatedJob job,
Frame source,
Vec response,
Frame[] cv_preds,
long[] offsets)
Compute the cross validation error from an array of predictions for N folds.
|
protected Frame |
scoreImpl(Frame adaptFrm)
Score already adapted frame.
|
protected void |
setCrossValidationError(Job.ValidatedJob job,
double cv_error,
ConfusionMatrix cm,
AUCData auc,
HitRatio hr) |
void |
setModelClassDistribution(float[] classdist) |
void |
start_training(long training_start_time) |
void |
start_training(Model previous) |
void |
stop_training() |
void |
testJavaScoring(Frame fr) |
java.lang.String |
toJava()
Return a String which is a valid Java program representing a class that
implements the Model.
|
SB |
toJava(SB sb) |
protected java.lang.String |
toJavaDefaultMaxIters() |
protected void |
toJavaFillPreds0(SB bodySb)
Fill preds[0] based on already filled and unified preds[1,..NCLASSES].
|
protected void |
toJavaInit(javassist.CtClass ct) |
protected SB |
toJavaInit(SB sb,
SB fileContextSB) |
protected SB |
toJavaNCLASSES(SB sb) |
protected void |
toJavaPredictBody(SB bodySb,
SB classCtxSb,
SB fileCtxSb) |
protected SB |
toJavaSuper(SB sb)
Generate implementation for super class.
|
protected void |
toJavaUnifyPreds(SB bodySb)
Generates code which unify preds[1,...NCLASSES]
|
VarImp |
varimp()
Variable importance of individual input features measured by this model.
|
delete_and_lock, delete, delete, delete, delete, is_unlocked, is_wlocked, read_lock, read_lock, unlock_all, unlock_lockable, unlock, update, write_lock
clone, frozenType, init, newInstance, read, toDocField, write, writeJSON, writeJSONFields
public static DocGen.FieldDoc[] DOC_FIELDS
@Request.API(help="Datakey used to *build* the model") public final Key _dataKey
@Request.API(help="Column names used to build the model") public final java.lang.String[] _names
@Request.API(help="Column names used to build the model") public final java.lang.String[][] _domains
@Request.API(help="Relative class distribution factors in original data") public final float[] _priorClassDist
@Request.API(help="Relative class distribution factors used for model building") protected float[] _modelClassDist
public long training_start_time
public long training_duration_in_ms
@Request.API(help="warnings") public java.lang.String[] warnings
protected boolean _have_cv_results
protected static final boolean GEN_BENCHMARK_CODE
public Model(Key selfKey, Key dataKey, Frame fr, float[] priorClassDist)
public Model(Key selfKey, Key dataKey, java.lang.String[] names, java.lang.String[][] domains, float[] priorClassDist, float[] modelClassDist)
public void setModelClassDistribution(float[] classdist)
public Request2 get_params()
public Request2 job()
public Model.ModelCategory getModelCategory()
public Futures delete_impl(Futures fs)
delete_impl
in class Lockable<Model>
public void addWarning(java.lang.String warning)
public boolean isSupervised()
public UniqueId getUniqueId()
public void start_training(long training_start_time)
public void start_training(Model previous)
public void stop_training()
public java.lang.String responseName()
public java.lang.String[] classNames()
public boolean isClassifier()
public int nclasses()
public int nfeatures()
public ConfusionMatrix cm()
public double mse()
public VarImp varimp()
public boolean hasCrossValModels()
public Frame score(Frame fr)
fr
frame.
The frame is always adapted to this model.fr
- frame to be scoredscore(Frame, boolean)
public final Frame score(Frame fr, boolean adapt)
fr
, producing a Frame result; the 1st Vec is the
predicted class, the remaining Vecs are the probability distributions.
For Regression (single-class) models, the 1st and only Vec is the
prediction value.
The flat adapt
fr
- frame which should be scoredadapt
- a flag enforcing an adaptation of fr
to this model. If flag
is false
scoring code expect that fr
is already adapted.protected Frame scoreImpl(Frame adaptFrm)
adaptFrm
- public final float[] score(Frame fr, boolean exact, int row)
public final float[] score(java.lang.String[] names, java.lang.String[][] domains, boolean exact, double[] row)
public final float[] score(int[][][] map, double[] row, float[] preds)
protected double missingColumnsType()
public Frame[] adapt(Frame fr, boolean exact)
public static int[][] getDomainMapping(java.lang.String[] modelDom, java.lang.String[] colDom, boolean exact)
modelDom
) and given column domain.public static int[][] getDomainMapping(java.lang.String colName, java.lang.String[] modelDom, java.lang.String[] colDom, boolean logNonExactMapping)
modelDom
.
In this case, modelDom
iscolName
- name of column which is mapped, can be null.modelDom
- logNonExactMapping
- protected float[] score0(Chunk[] chks, int row_in_chunk, double[] tmp, float[] preds)
public double calcError(Frame ftest, Vec vactual, Frame fpreds, Frame hitratio_fpreds, java.lang.String label, boolean printMe, int max_conf_mat_size, ConfusionMatrix cm, AUC auc, HitRatio hr)
ftest
- Frame containing test datavactual
- The response column Vecfpreds
- Frame containing ADAPTED (domain labels from train+test data) predicted data (classification: label + per-class probabilities, regression: target)hitratio_fpreds
- Frame containing predicted data (domain labels from test data) (classification: label + per-class probabilities, regression: target)label
- Name for the scored data set to be printedprintMe
- Whether to print the scoring results to Log.infomax_conf_mat_size
- Largest size of Confusion Matrix (#classes) for it to be printed to Log.infocm
- Confusion Matrix object to populate for multi-class classification (also used for regression)auc
- AUC object to populate for binary classificationhr
- HitRatio object to populate for classificationprotected abstract float[] score0(double[] data, float[] preds)
public double score(double[] data)
public java.lang.String toJava()
class UUIDxxxxModel { public static final String NAMES[] = { ....column names... } public static final String DOMAINS[][] = { ....domain names... } // Pass in data in a double[], pre-aligned to the Model's requirements. // Jam predictions into the preds[] array; preds[0] is reserved for the // main prediction (class for classifiers or value for regression), // and remaining columns hold a probability distribution for classifiers. float[] predict( double data[], float preds[] ); double[] map( HashMap < String,Double > row, double data[] ); // Does the mapping lookup for every row, no allocation float[] predict( HashMap < String,Double > row, double data[], float preds[] ); // Allocates a double[] for every row float[] predict( HashMap < String,Double > row, float preds[] ); // Allocates a double[] and a float[] for every row float[] predict( HashMap < String,Double > row ); }
protected void toJavaInit(javassist.CtClass ct)
protected java.lang.String toJavaDefaultMaxIters()
public void testJavaScoring(Frame fr)
protected void toJavaUnifyPreds(SB bodySb)
protected void toJavaFillPreds0(SB bodySb)
public final void scoreCrossValidation(Job.ValidatedJob job, Frame source, Vec response, Frame[] cv_preds, long[] offsets)
source
- Full training dataresponse
- Full responsecv_preds
- N Frames containing predictions made by N-fold CV runs on disjoint contiguous holdout pieces of the training dataoffsets
- Starting row numbers for the N CV pieces (length = N+1, first element: 0, last element: #rows)protected void setCrossValidationError(Job.ValidatedJob job, double cv_error, ConfusionMatrix cm, AUCData auc, HitRatio hr)
protected void printCrossValidationModelsHTML(java.lang.StringBuilder sb)
public AutoBufferSerializer<Model> getModelSerializer()