public abstract class GenModel extends java.lang.Object implements IGenModel, IGeneratedModel, java.io.Serializable
| Modifier and Type | Field and Description |
|---|---|
java.lang.String[][] |
_domains
Categorical (factor/enum) mappings, per column.
|
java.lang.String[] |
_names
Column names; last is response for supervised models
|
java.lang.String |
_offsetColumn
Name of the column with offsets (used for certain types of models).
|
| Constructor and Description |
|---|
GenModel(java.lang.String[] names,
java.lang.String[][] domains) |
| Modifier and Type | Method and Description |
|---|---|
static boolean |
bitSetContains(byte[] bits,
int bitoff,
double dnum) |
static double[] |
correctProbabilities(double[] scored,
double[] priorClassDist,
double[] modelClassDist)
Correct a given list of class probabilities produced as a prediction by a model back to prior class distribution
|
static void |
GBM_rescale(double[] preds) |
int |
getColIdx(java.lang.String name)
Returns index of a column with given name, or -1 if the column is not found.
|
java.lang.String[][] |
getDomainValues()
Returns domain values for all columns, including the response column.
|
java.lang.String[] |
getDomainValues(int i)
Returns domain values for the i-th column.
|
java.lang.String[] |
getDomainValues(java.lang.String name)
Gets domain of the given column.
|
java.lang.String |
getHeader()
???
|
abstract ModelCategory |
getModelCategory()
Returns this model category.
|
java.lang.String[] |
getNames()
The names of all columns used, including response and offset columns.
|
int |
getNumClasses(int colIdx)
Get number of classes in the given column.
|
int |
getNumCols()
Returns number of columns used as input for training (i.e., exclude response and offset columns).
|
int |
getNumResponseClasses()
Return a number of classes in response column.
|
static int |
getPrediction(double[] preds,
double[] priorClassDist,
double[] data,
double threshold)
Utility function to get a best prediction from an array of class
prediction distribution.
|
int |
getPredsSize()
Returns the expected size of preds array which is passed to `predict(double[], double[])` function.
|
int |
getResponseIdx()
Returns the index of the response column inside getDomains().
|
java.lang.String |
getResponseName()
The name of the response column.
|
abstract java.lang.String |
getUUID()
Returns model's unique identifier.
|
static double |
GLM_identityInv(double x) |
static double |
GLM_inverseInv(double x) |
static double |
GLM_logInv(double x) |
static double |
GLM_logitInv(double x) |
static double |
GLM_tweedieInv(double x,
double tweedie_link_power) |
boolean |
isAutoEncoder()
Returns true if this model represents an AutoEncoder.
|
boolean |
isClassifier()
Returns true if this model represents a classifier, else it is used for regression.
|
boolean |
isSupervised()
Returns true for supervised models.
|
static int |
KMeans_closest(double[][] centers,
double[] point,
java.lang.String[][] domains,
double[] means,
double[] mults) |
static double |
KMeans_distance(double[] center,
double[] point,
java.lang.String[][] domains,
double[] means,
double[] mults) |
static double |
KMeans_distance(double[] center,
float[] point,
java.lang.String[][] domains,
double[] means,
double[] mults,
double[] colSum,
double[] colSumSq) |
static double[] |
KMeans_simplex(double[][] centers,
double[] point,
java.lang.String[][] domains,
double[] means,
double[] mults) |
static double |
log_rescale(double[] preds) |
double[] |
map(java.util.Map<java.lang.String,java.lang.Double> row,
double[] data)
Takes a HashMap mapping column names to doubles.
|
int |
mapEnum(int colIdx,
java.lang.String enumValue)
Maps given column's categorical to the integer used by this model (returns -1 if mapping not found).
|
int |
nclasses()
Returns number of output classes for classifiers, 1 for regression models, and 0 for unsupervised models.
|
int |
nfeatures()
Returns number of input features.
|
float[] |
predict(double[] data,
float[] preds)
Predict the given row and return prediction.
|
float[] |
predict(double[] data,
float[] preds,
int maxIters)
Predict the given row and return prediction using given number of iterations (e.g., number of trees from forest).
|
abstract double[] |
score0(double[] row,
double[] preds)
Subclasses implement the scoring logic.
|
double[] |
score0(double[] row,
double offset,
double[] preds) |
double[] |
score0(java.util.Map<java.lang.String,java.lang.Double> row) |
double[] |
score0(java.util.Map<java.lang.String,java.lang.Double> row,
double[] preds) |
double[] |
score0(java.util.Map<java.lang.String,java.lang.Double> row,
double[] data,
double[] preds) |
public final java.lang.String[] _names
public final java.lang.String[][] _domains
public java.lang.String _offsetColumn
public boolean isSupervised()
isSupervised in interface IGenModelpublic int nfeatures()
public int nclasses()
public abstract ModelCategory getModelCategory()
getModelCategory in interface IGenModelModelCategorypublic abstract java.lang.String getUUID()
IGeneratedModelgetUUID in interface IGeneratedModelpublic int getNumCols()
getNumCols in interface IGeneratedModelpublic java.lang.String[] getNames()
getNames in interface IGeneratedModelpublic java.lang.String getResponseName()
getResponseName in interface IGeneratedModelpublic int getResponseIdx()
getResponseIdx in interface IGeneratedModelpublic int getNumClasses(int colIdx)
getNumClasses in interface IGeneratedModelpublic int getNumResponseClasses()
getNumResponseClasses in interface IGeneratedModelpublic boolean isClassifier()
isClassifier in interface IGeneratedModelpublic boolean isAutoEncoder()
isAutoEncoder in interface IGeneratedModelpublic java.lang.String[] getDomainValues(java.lang.String name)
getDomainValues in interface IGeneratedModelname - column namepublic java.lang.String[] getDomainValues(int i)
getDomainValues in interface IGeneratedModeli - index of columnpublic java.lang.String[][] getDomainValues()
getDomainValues in interface IGeneratedModelpublic int getColIdx(java.lang.String name)
getColIdx in interface IGeneratedModelpublic int mapEnum(int colIdx,
java.lang.String enumValue)
mapEnum in interface IGeneratedModelpublic int getPredsSize()
getPredsSize in interface IGeneratedModelpublic float[] predict(double[] data,
float[] preds)
IGeneratedModelpredict in interface IGeneratedModeldata - row holding the data. Ordering should follow ordering of columns returned by getNames()preds - allocated array to hold a predictionpublic float[] predict(double[] data,
float[] preds,
int maxIters)
IGeneratedModelpredict in interface IGeneratedModeldata - row holding the data. Ordering should follow ordering of columns returned by getNames()preds - allocated array to hold a predictionmaxIters - maximum number of iterations to use during predicting processpublic double[] map(java.util.Map<java.lang.String,java.lang.Double> row,
double[] data)
Looks up the column names needed by the model, and places the doubles into the data array in the order needed by the model. Missing columns use NaN.
public abstract double[] score0(double[] row,
double[] preds)
public double[] score0(double[] row,
double offset,
double[] preds)
public double[] score0(java.util.Map<java.lang.String,java.lang.Double> row,
double[] data,
double[] preds)
public double[] score0(java.util.Map<java.lang.String,java.lang.Double> row,
double[] preds)
public double[] score0(java.util.Map<java.lang.String,java.lang.Double> row)
public static double[] correctProbabilities(double[] scored,
double[] priorClassDist,
double[] modelClassDist)
The implementation is based on Eq. (27) in the paper.
scored - list of class probabilities beginning at index 1priorClassDist - original class distributionmodelClassDist - class distribution used for model building (e.g., data was oversampled)public static int getPrediction(double[] preds,
double[] priorClassDist,
double[] data,
double threshold)
preds - an array of prediction distribution. Length of arrays is equal to a number of classes+1.priorClassDist - prior class probabilities (used to break ties)data - Test datathreshold - threshold for binary classifierpublic static boolean bitSetContains(byte[] bits,
int bitoff,
double dnum)
public static int KMeans_closest(double[][] centers,
double[] point,
java.lang.String[][] domains,
double[] means,
double[] mults)
public static double[] KMeans_simplex(double[][] centers,
double[] point,
java.lang.String[][] domains,
double[] means,
double[] mults)
public static double KMeans_distance(double[] center,
float[] point,
java.lang.String[][] domains,
double[] means,
double[] mults,
double[] colSum,
double[] colSumSq)
public static double KMeans_distance(double[] center,
double[] point,
java.lang.String[][] domains,
double[] means,
double[] mults)
public static double log_rescale(double[] preds)
public static void GBM_rescale(double[] preds)
public static double GLM_identityInv(double x)
public static double GLM_logitInv(double x)
public static double GLM_logInv(double x)
public static double GLM_inverseInv(double x)
public static double GLM_tweedieInv(double x,
double tweedie_link_power)
public java.lang.String getHeader()