Model

java.lang.Object
- water.Iced
- - water.Lockable<Model>
  - - water.Model

All Implemented Interfaces:

java.lang.Cloneable, Freezable

Direct Known Subclasses:

CoxPH.CoxPHModel, DeepLearningModel, DTree.TreeModel, GapStatisticModel, GLMModel, KMeans2.KMeans2Model, NBModel, NeuralNet.NeuralNetModel, PCAModel, SpeeDRFModel
```
public abstract class Model
extends Lockable<Model>
```
A Model models reality (hopefully). A model can be used to 'score' a row, or a collection of rows on any compatible dataset - meaning the row has all the columns with the same names as used to build the mode.

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

protected static class Model.ModelAutobufferSerializer
Helper type for serialization

static class Model.ModelCategory

Nested Classes
Modifier and Type	Class and Description
`protected static class`	`Model.ModelAutobufferSerializer` Helper type for serialization
`static class`	`Model.ModelCategory`

Field Summary

Fields
Modifier and Type	Field and Description
`Key`	`_dataKey` Dataset key used to build the model, for models for which this makes sense, or null otherwise.
`java.lang.String[][]`	`_domains` Categorical/factor/enum mappings, per column.
`protected boolean`	`_have_cv_results` Whether or not this model has cross-validated results stored.
`protected float[]`	`_modelClassDist`
`java.lang.String[]`	`_names` Columns used in the model and are used to match up with scoring data columns.
`float[]`	`_priorClassDist`
`static DocGen.FieldDoc[]`	`DOC_FIELDS`
`protected static boolean`	`GEN_BENCHMARK_CODE` Debug flag to generate benchmar code
`long`	`training_duration_in_ms` The duration in mS for model training.
`long`	`training_start_time` The start time in mS since the epoch for model training.
`java.lang.String[]`	`warnings` Any warnings thrown during model building.

Fields inherited from class water.Lockable
_key, _lockers

Constructor Summary

Constructors
Constructor and Description
`Model(Key selfKey, Key dataKey, Frame fr, float[] priorClassDist)` Full constructor from frame: Strips out the Vecs to just the names needed to match columns later for future datasets.
`Model(Key selfKey, Key dataKey, java.lang.String[] names, java.lang.String[][] domains, float[] priorClassDist, float[] modelClassDist)`
`Model(Key selfKey, Key dataKey, java.lang.String[] names, java.lang.String[][] domains, float[] priorClassDist, float[] modelClassDist, long training_start_time, long training_duration_in_ms)` Full constructor

Method Summary

Methods
Modifier and Type	Method and Description
`Frame[]`	`adapt(Frame fr, boolean exact)` Build an adapted Frame from the given Frame.
`Frame[]`	`adapt(Frame fr, boolean exact, boolean haveResponse)`
`void`	`addWarning(java.lang.String warning)`
`double`	`calcError(Frame ftest, Vec vactual, Frame fpreds, Frame hitratio_fpreds, java.lang.String label, boolean printMe, int max_conf_mat_size, ConfusionMatrix cm, AUC auc, HitRatio hr)` Compute the model error for a given test data set For multi-class classification, this is the classification error based on assigning labels for the highest predicted per-class probability.
`java.lang.String[]`	`classNames()`
`ConfusionMatrix`	`cm()` For classifiers, confusion matrix on validation set.
`Futures`	`delete_impl(Futures fs)` Remove any Model internal Keys
`java.lang.String`	`errStr()`
`Request2`	`get_params()`
`static int[][]`	`getDomainMapping(java.lang.String[] modelDom, java.lang.String[] colDom, boolean exact)` Returns a mapping between values of model domains (`modelDom`) and given column domain.
`static int[][]`	`getDomainMapping(java.lang.String colName, java.lang.String[] modelDom, java.lang.String[] colDom, boolean logNonExactMapping)` Returns a mapping for given column according to given `modelDom`.
`Model.ModelCategory`	`getModelCategory()`
`AutoBufferSerializer<Model>`	`getModelSerializer()` Returns a model serializer into AutoBuffer.
`UniqueId`	`getUniqueId()`
`boolean`	`hasCrossValModels()`
`boolean`	`isClassifier()`
`boolean`	`isSupervised()`
`Request2`	`job()`
`protected double`	`missingColumnsType()` Type of missing columns during adaptation between train/test datasets Overload this method for models that have sparse data handling.
`double`	`mse()` Returns mse for validation set.
`int`	`nclasses()`
`int`	`nfeatures()` Returns number of input features
`protected void`	`printCrossValidationModelsHTML(java.lang.StringBuilder sb)`
`java.lang.String`	`responseName()`
`double`	`score(double[] data)`
`Frame`	`score(Frame fr)` Bulk score for given `fr` frame.
`Frame`	`score(Frame fr, boolean adapt)` Bulk score the frame `fr`, producing a Frame result; the 1st Vec is the predicted class, the remaining Vecs are the probability distributions.
`float[]`	`score(Frame fr, boolean exact, int row)` Single row scoring, on a compatible Frame.
`float[]`	`score(int[][][] map, double[] row, float[] preds)` Single row scoring, on a compatible set of data, given an adaption vector
`float[]`	`score(java.lang.String[] names, java.lang.String[][] domains, boolean exact, double[] row)` Single row scoring, on a compatible set of data.
`protected float[]`	`score0(Chunk[] chks, int row_in_chunk, double[] tmp, float[] preds)` Bulk scoring API for one row.
`protected abstract float[]`	`score0(double[] data, float[] preds)` Subclasses implement the scoring logic.
`void`	`scoreCrossValidation(Job.ValidatedJob job, Frame source, Vec response, Frame[] cv_preds, long[] offsets)` Compute the cross validation error from an array of predictions for N folds.
`protected Frame`	`scoreImpl(Frame adaptFrm)` Score already adapted frame.
`protected void`	`setCrossValidationError(Job.ValidatedJob job, double cv_error, ConfusionMatrix cm, AUCData auc, HitRatio hr)`
`void`	`setModelClassDistribution(float[] classdist)`
`void`	`start_training(long training_start_time)`
`void`	`start_training(Model previous)`
`void`	`stop_training()`
`void`	`testJavaScoring(Frame fr)`
`java.lang.String`	`toJava()` Return a String which is a valid Java program representing a class that implements the Model.
`SB`	`toJava(SB sb)`
`protected java.lang.String`	`toJavaDefaultMaxIters()`
`protected void`	`toJavaFillPreds0(SB bodySb)` Fill preds[0] based on already filled and unified preds[1,..NCLASSES].
`protected void`	`toJavaInit(javassist.CtClass ct)`
`protected SB`	`toJavaInit(SB sb, SB fileContextSB)`
`protected SB`	`toJavaNCLASSES(SB sb)`
`protected void`	`toJavaPredictBody(SB bodySb, SB classCtxSb, SB fileCtxSb)`
`protected SB`	`toJavaSuper(SB sb)` Generate implementation for super class.
`protected void`	`toJavaUnifyPreds(SB bodySb)` Generates code which unify preds[1,...NCLASSES]
`VarImp`	`varimp()` Variable importance of individual input features measured by this model.

Methods inherited from class water.Lockable
delete_and_lock, delete, delete, delete, delete, is_unlocked, is_wlocked, read_lock, read_lock, unlock_all, unlock_lockable, unlock, update, write_lock

Methods inherited from class water.Iced
clone, frozenType, init, newInstance, read, toDocField, write, writeJSON, writeJSONFields

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - DOC_FIELDS
```
public static DocGen.FieldDoc[] DOC_FIELDS
```
  - _dataKey
```
@Request.API(help="Datakey used to *build* the model")
public final Key _dataKey
```
    Dataset key used to *build* the model, for models for which this makes sense, or null otherwise. Not all models are built from a dataset (eg artificial models), or are built from a single dataset (various ensemble models), so this key has no *mathematical* significance in the model but is handy during common model-building and for the historical record.
  - _names
```
@Request.API(help="Column names used to build the model")
public final java.lang.String[] _names
```
    Columns used in the model and are used to match up with scoring data columns. The last name is the response column name.
  - _domains
```
@Request.API(help="Column names used to build the model")
public final java.lang.String[][] _domains
```
    Categorical/factor/enum mappings, per column. Null for non-enum cols. The last column holds the response col enums.
  - _priorClassDist
```
@Request.API(help="Relative class distribution factors in original data")
public final float[] _priorClassDist
```
  - _modelClassDist
```
@Request.API(help="Relative class distribution factors used for model building")
protected float[] _modelClassDist
```
  - training_start_time
```
public long training_start_time
```
    The start time in mS since the epoch for model training.
  - training_duration_in_ms
```
public long training_duration_in_ms
```
    The duration in mS for model training.
  - warnings
```
@Request.API(help="warnings")
public java.lang.String[] warnings
```
    Any warnings thrown during model building.
  - _have_cv_results
```
protected boolean _have_cv_results
```
    Whether or not this model has cross-validated results stored.
  - GEN_BENCHMARK_CODE
```
protected static final boolean GEN_BENCHMARK_CODE
```
    Debug flag to generate benchmar code
    
    See Also:
    Constant Field Values
- Constructor Detail
  - Model
```
public Model(Key selfKey,
     Key dataKey,
     Frame fr,
     float[] priorClassDist)
```
    Full constructor from frame: Strips out the Vecs to just the names needed to match columns later for future datasets.
  - Model
```
public Model(Key selfKey,
     Key dataKey,
     java.lang.String[] names,
     java.lang.String[][] domains,
     float[] priorClassDist,
     float[] modelClassDist)
```
  - Model
```
public Model(Key selfKey,
     Key dataKey,
     java.lang.String[] names,
     java.lang.String[][] domains,
     float[] priorClassDist,
     float[] modelClassDist,
     long training_start_time,
     long training_duration_in_ms)
```
    Full constructor
- Method Detail
  - setModelClassDistribution
```
public void setModelClassDistribution(float[] classdist)
```
  - get_params
```
public Request2 get_params()
```
  - job
```
public Request2 job()
```
  - getModelCategory
```
public Model.ModelCategory getModelCategory()
```
  - delete_impl
```
public Futures delete_impl(Futures fs)
```
    Remove any Model internal Keys
    
    Specified by:
    
    delete_impl in class Lockable<Model>
  - errStr
```
public java.lang.String errStr()
```
    Specified by:
    
    errStr in class Lockable<Model>
  - addWarning
```
public void addWarning(java.lang.String warning)
```
  - isSupervised
```
public boolean isSupervised()
```
  - getUniqueId
```
public UniqueId getUniqueId()
```
  - start_training
```
public void start_training(long training_start_time)
```
  - start_training
```
public void start_training(Model previous)
```
  - stop_training
```
public void stop_training()
```
  - responseName
```
public java.lang.String responseName()
```
  - classNames
```
public java.lang.String[] classNames()
```
  - isClassifier
```
public boolean isClassifier()
```
  - nclasses
```
public int nclasses()
```
  - nfeatures
```
public int nfeatures()
```
    Returns number of input features
  - cm
```
public ConfusionMatrix cm()
```
    For classifiers, confusion matrix on validation set.
  - mse
```
public double mse()
```
    Returns mse for validation set.
  - varimp
```
public VarImp varimp()
```
    Variable importance of individual input features measured by this model.
  - hasCrossValModels
```
public boolean hasCrossValModels()
```
  - score
```
public Frame score(Frame fr)
```
    Bulk score for given fr frame. The frame is always adapted to this model.
    
    Parameters:
    fr - frame to be scored
    
    Returns:
    frame holding predicted values
    See Also:
    score(Frame, boolean)
  - score
```
public final Frame score(Frame fr,
          boolean adapt)
```
    Bulk score the frame fr, producing a Frame result; the 1st Vec is the predicted class, the remaining Vecs are the probability distributions. For Regression (single-class) models, the 1st and only Vec is the prediction value. The flat adapt
    
    Parameters:
    fr - frame which should be scored
    adapt - a flag enforcing an adaptation of fr to this model. If flag is false scoring code expect that fr is already adapted.
    
    Returns:
    a new frame containing a predicted values. For classification it contains a column with prediction and distribution for all response classes. For regression it contains only one column with predicted values.
  - scoreImpl
```
protected Frame scoreImpl(Frame adaptFrm)
```
    Score already adapted frame.
    
    Parameters:
    adaptFrm -
    
    Returns:
  - score
```
public final float[] score(Frame fr,
            boolean exact,
            int row)
```
    Single row scoring, on a compatible Frame.
  - score
```
public final float[] score(java.lang.String[] names,
            java.lang.String[][] domains,
            boolean exact,
            double[] row)
```
    Single row scoring, on a compatible set of data. Fairly expensive to adapt.
  - score
```
public final float[] score(int[][][] map,
            double[] row,
            float[] preds)
```
    Single row scoring, on a compatible set of data, given an adaption vector
  - missingColumnsType
```
protected double missingColumnsType()
```
    Type of missing columns during adaptation between train/test datasets Overload this method for models that have sparse data handling. Otherwise, NaN is used.
    
    Returns:
    real-valued number (can be NaN)
  - adapt
```
public Frame[] adapt(Frame fr,
            boolean exact)
```
    Build an adapted Frame from the given Frame. Useful for efficient bulk scoring of a new dataset to an existing model. Same adaption as above, but expressed as a Frame instead of as an int[][]. The returned Frame does not have a response column. It returns a two element array containing an adapted frame and a frame which contains only vectors which where adapted (the purpose of the second frame is to delete all adapted vectors with deletion of the frame).
  - adapt
```
public Frame[] adapt(Frame fr,
            boolean exact,
            boolean haveResponse)
```
  - getDomainMapping
```
public static int[][] getDomainMapping(java.lang.String[] modelDom,
                       java.lang.String[] colDom,
                       boolean exact)
```
    Returns a mapping between values of model domains (modelDom) and given column domain.
    
    See Also:
    getDomainMapping(String, String[], String[], boolean)
  - getDomainMapping
```
public static int[][] getDomainMapping(java.lang.String colName,
                       java.lang.String[] modelDom,
                       java.lang.String[] colDom,
                       boolean logNonExactMapping)
```
    Returns a mapping for given column according to given modelDom. In this case, modelDom is
    
    Parameters:
    colName - name of column which is mapped, can be null.
    modelDom -
    logNonExactMapping -
    
    Returns:
  - score0
```
protected float[] score0(Chunk[] chks,
             int row_in_chunk,
             double[] tmp,
             float[] preds)
```
    Bulk scoring API for one row. Chunks are all compatible with the model, and expect the last Chunks are for the final distribution and prediction. Default method is to just load the data into the tmp array, then call subclass scoring logic.
  - calcError
```
public double calcError(Frame ftest,
               Vec vactual,
               Frame fpreds,
               Frame hitratio_fpreds,
               java.lang.String label,
               boolean printMe,
               int max_conf_mat_size,
               ConfusionMatrix cm,
               AUC auc,
               HitRatio hr)
```
    Compute the model error for a given test data set For multi-class classification, this is the classification error based on assigning labels for the highest predicted per-class probability. For binary classification, this is the classification error based on assigning labels using the optimal threshold for maximizing the F1 score. For regression, this is the mean squared error (MSE).
    
    Parameters:
    ftest - Frame containing test data
    vactual - The response column Vec
    fpreds - Frame containing ADAPTED (domain labels from train+test data) predicted data (classification: label + per-class probabilities, regression: target)
    hitratio_fpreds - Frame containing predicted data (domain labels from test data) (classification: label + per-class probabilities, regression: target)
    label - Name for the scored data set to be printed
    printMe - Whether to print the scoring results to Log.info
    max_conf_mat_size - Largest size of Confusion Matrix (#classes) for it to be printed to Log.info
    cm - Confusion Matrix object to populate for multi-class classification (also used for regression)
    auc - AUC object to populate for binary classification
    hr - HitRatio object to populate for classification
    
    Returns:
    model error, see description above
  - score0
```
protected abstract float[] score0(double[] data,
             float[] preds)
```
    Subclasses implement the scoring logic. The data is pre-loaded into a re-used temp array, in the order the model expects. The predictions are loaded into the re-used temp array, which is also returned.
  - score
```
public double score(double[] data)
```
  - toJava
```
public java.lang.String toJava()
```
    Return a String which is a valid Java program representing a class that implements the Model. The Java is of the form:
```
    class UUIDxxxxModel {
      public static final String NAMES[] = { ....column names... }
      public static final String DOMAINS[][] = { ....domain names... }
      // Pass in data in a double[], pre-aligned to the Model's requirements.
      // Jam predictions into the preds[] array; preds[0] is reserved for the
      // main prediction (class for classifiers or value for regression),
      // and remaining columns hold a probability distribution for classifiers.
      float[] predict( double data[], float preds[] );
      double[] map( HashMap < String,Double > row, double data[] );
      // Does the mapping lookup for every row, no allocation
      float[] predict( HashMap < String,Double > row, double data[], float preds[] );
      // Allocates a double[] for every row
      float[] predict( HashMap < String,Double > row, float preds[] );
      // Allocates a double[] and a float[] for every row
      float[] predict( HashMap < String,Double > row );
    }
  
```
  - toJava
```
public SB toJava(SB sb)
```
  - toJavaSuper
```
protected SB toJavaSuper(SB sb)
```
    Generate implementation for super class.
  - toJavaNCLASSES
```
protected SB toJavaNCLASSES(SB sb)
```
  - toJavaInit
```
protected SB toJavaInit(SB sb,
            SB fileContextSB)
```
  - toJavaInit
```
protected void toJavaInit(javassist.CtClass ct)
```
  - toJavaPredictBody
```
protected void toJavaPredictBody(SB bodySb,
                     SB classCtxSb,
                     SB fileCtxSb)
```
  - toJavaDefaultMaxIters
```
protected java.lang.String toJavaDefaultMaxIters()
```
  - testJavaScoring
```
public void testJavaScoring(Frame fr)
```
  - toJavaUnifyPreds
```
protected void toJavaUnifyPreds(SB bodySb)
```
    Generates code which unify preds[1,...NCLASSES]
  - toJavaFillPreds0
```
protected void toJavaFillPreds0(SB bodySb)
```
    Fill preds[0] based on already filled and unified preds[1,..NCLASSES].
  - scoreCrossValidation
```
public final void scoreCrossValidation(Job.ValidatedJob job,
                        Frame source,
                        Vec response,
                        Frame[] cv_preds,
                        long[] offsets)
```
    Compute the cross validation error from an array of predictions for N folds. Also stores the results in the model for display/query.
    
    Parameters:
    source - Full training data
    response - Full response
    cv_preds - N Frames containing predictions made by N-fold CV runs on disjoint contiguous holdout pieces of the training data
    offsets - Starting row numbers for the N CV pieces (length = N+1, first element: 0, last element: #rows)
  - setCrossValidationError
```
protected void setCrossValidationError(Job.ValidatedJob job,
                           double cv_error,
                           ConfusionMatrix cm,
                           AUCData auc,
                           HitRatio hr)
```
  - printCrossValidationModelsHTML
```
protected void printCrossValidationModelsHTML(java.lang.StringBuilder sb)
```
  - getModelSerializer
```
public AutoBufferSerializer<Model> getModelSerializer()
```
    Returns a model serializer into AutoBuffer.

Class Model

Nested Class Summary

Field Summary

Fields inherited from class water.Lockable

Constructor Summary

Method Summary

Methods inherited from class water.Lockable

Methods inherited from class water.Iced

Methods inherited from class java.lang.Object

Field Detail

DOC_FIELDS

_dataKey

_names

_domains

_priorClassDist

_modelClassDist

training_start_time

training_duration_in_ms

warnings

_have_cv_results

GEN_BENCHMARK_CODE

Constructor Detail

Model

Model

Model

Method Detail

setModelClassDistribution

get_params

job

getModelCategory

delete_impl

errStr

addWarning

isSupervised

getUniqueId

start_training

start_training

stop_training

responseName

classNames

isClassifier

nclasses

nfeatures

cm

mse

varimp

hasCrossValModels

score

score

scoreImpl

score

score

score

missingColumnsType

adapt

adapt

getDomainMapping

getDomainMapping

score0

calcError

score0

score

toJava

toJava

toJavaSuper

toJavaNCLASSES

toJavaInit

toJavaInit

toJavaPredictBody

toJavaDefaultMaxIters

testJavaScoring

toJavaUnifyPreds

toJavaFillPreds0

scoreCrossValidation

setCrossValidationError

printCrossValidationModelsHTML

getModelSerializer