public abstract class GenModel extends java.lang.Object implements IGenModel, IGeneratedModel, java.io.Serializable
| Modifier and Type | Field and Description |
|---|---|
java.lang.String[][] |
_domains
Categorical/factor/enum mappings, per column.
|
java.lang.String[] |
_names
Column names; last is response for supervised models
|
| Constructor and Description |
|---|
GenModel(java.lang.String[] names,
java.lang.String[][] domains) |
| Modifier and Type | Method and Description |
|---|---|
static boolean |
bitSetContains(byte[] bits,
int bitoff,
int num) |
static double[] |
correctProbabilities(double[] scored,
double[] priorClassDist,
double[] modelClassDist)
Correct a given list of class probabilities produced as a prediction by a model back to prior class distribution
|
static void |
GBM_rescale(double[] preds) |
int |
getColIdx(java.lang.String name)
Returns index of column with give name or -1 if column is not found.
|
java.lang.String[][] |
getDomainValues()
Returns domain values for all columns including response column.
|
java.lang.String[] |
getDomainValues(int i)
Returns domain values for i-th column.
|
java.lang.String[] |
getDomainValues(java.lang.String name)
Gets domain of given column.
|
java.lang.String |
getHeader()
???
|
java.lang.String[] |
getNames()
The names of columns used in the model.
|
int |
getNumClasses(int colIdx)
Get number of classes in in given column.
|
int |
getNumCols()
Returns number of columns used as input for training (i.e., exclude response column).
|
int |
getNumResponseClasses()
Return a number of classes in response column.
|
static int |
getPrediction(double[] preds,
double[] priorClassDist,
double[] data,
double threshold)
Utility function to get a best prediction from an array of class
prediction distribution.
|
int |
getPredsSize()
Returns the expected size of preds array which is passed to
IGeneratedModel.predict(double[], float[]) function. |
int |
getResponseIdx()
Returns an index of the response column inside getDomains().
|
java.lang.String |
getResponseName()
The name of the response column.
|
static double |
GLM_identityInv(double x) |
static double |
GLM_inverseInv(double x) |
static double |
GLM_logInv(double x) |
static double |
GLM_logitInv(double x) |
static double |
GLM_tweedieInv(double x,
double tweedie_link_power) |
boolean |
isAutoEncoder() |
boolean |
isClassifier() |
boolean |
isSupervised()
Returns true for supervised models.
|
static int |
KMeans_closest(double[][] centers,
double[] point,
java.lang.String[][] domains,
double[] means,
double[] mults) |
static double |
KMeans_distance(double[] center,
double[] point,
java.lang.String[][] domains,
double[] means,
double[] mults) |
static double |
KMeans_distance(double[] center,
float[] point,
java.lang.String[][] domains,
double[] means,
double[] mults,
double[] colSum,
double[] colSumSq) |
static double |
log_rescale(double[] preds) |
double[] |
map(java.util.Map<java.lang.String,java.lang.Double> row,
double[] data)
Takes a HashMap mapping column names to doubles.
|
int |
mapEnum(int colIdx,
java.lang.String enumValue)
Maps given column's enum to integer used by this model.
|
int |
nclasses()
Returns number of output classes for classifiers or 1 for regression models.
|
int |
nfeatures()
Returns number of input features.
|
float[] |
predict(double[] data,
float[] preds)
Predict the given row and return prediction.
|
float[] |
predict(double[] data,
float[] preds,
int maxIters)
Predict the given row and return prediction using given number of iterations (e.g., number of trees from forest).
|
abstract double[] |
score0(double[] data,
double[] preds)
Subclasses implement the scoring logic.
|
double[] |
score0(java.util.Map<java.lang.String,java.lang.Double> row) |
double[] |
score0(java.util.Map<java.lang.String,java.lang.Double> row,
double[] preds) |
double[] |
score0(java.util.Map<java.lang.String,java.lang.Double> row,
double[] data,
double[] preds) |
static double[] |
SharedTree_clean(double[] data) |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitgetModelCategorygetUUIDpublic final java.lang.String[] _names
public final java.lang.String[][] _domains
public boolean isSupervised()
IGenModelisSupervised in interface IGenModelpublic int nfeatures()
IGenModelpublic int nclasses()
IGenModelpublic int getNumCols()
IGeneratedModelgetNumCols in interface IGeneratedModelpublic int getResponseIdx()
IGeneratedModelgetResponseIdx in interface IGeneratedModelpublic java.lang.String getResponseName()
IGeneratedModelgetResponseName in interface IGeneratedModelpublic int getNumResponseClasses()
IGeneratedModelgetNumResponseClasses in interface IGeneratedModelpublic java.lang.String[] getNames()
IGeneratedModelgetNames in interface IGeneratedModelpublic int getColIdx(java.lang.String name)
IGeneratedModelgetColIdx in interface IGeneratedModelpublic int getNumClasses(int colIdx)
IGeneratedModelgetNumClasses in interface IGeneratedModelpublic java.lang.String[] getDomainValues(java.lang.String name)
IGeneratedModelgetDomainValues in interface IGeneratedModelname - column namepublic java.lang.String[] getDomainValues(int i)
IGeneratedModelgetDomainValues in interface IGeneratedModeli - index of columnpublic int mapEnum(int colIdx,
java.lang.String enumValue)
IGeneratedModelmapEnum in interface IGeneratedModelpublic java.lang.String[][] getDomainValues()
IGeneratedModelgetDomainValues in interface IGeneratedModelpublic boolean isClassifier()
isClassifier in interface IGeneratedModelpublic boolean isAutoEncoder()
isAutoEncoder in interface IGeneratedModelpublic int getPredsSize()
IGeneratedModelIGeneratedModel.predict(double[], float[]) function.getPredsSize in interface IGeneratedModelpublic java.lang.String getHeader()
public double[] map(java.util.Map<java.lang.String,java.lang.Double> row,
double[] data)
Looks up the column names needed by the model, and places the doubles into the data array in the order needed by the model. Missing columns use NaN.
public float[] predict(double[] data,
float[] preds)
IGeneratedModelpredict in interface IGeneratedModeldata - row holding the data. Ordering should follow ordering of columns returned by getNames()preds - allocated array to hold a predictionpublic float[] predict(double[] data,
float[] preds,
int maxIters)
IGeneratedModelpredict in interface IGeneratedModeldata - row holding the data. Ordering should follow ordering of columns returned by getNames()preds - allocated array to hold a predictionmaxIters - maximum number of iterations to use during predicting processpublic abstract double[] score0(double[] data,
double[] preds)
public double[] score0(java.util.Map<java.lang.String,java.lang.Double> row,
double[] data,
double[] preds)
public double[] score0(java.util.Map<java.lang.String,java.lang.Double> row,
double[] preds)
public double[] score0(java.util.Map<java.lang.String,java.lang.Double> row)
public static double[] correctProbabilities(double[] scored,
double[] priorClassDist,
double[] modelClassDist)
The implementation is based on Eq. (27) in the paper.
scored - list of class probabilities beginning at index 1priorClassDist - original class distributionmodelClassDist - class distribution used for model building (e.g., data was oversampled)public static int getPrediction(double[] preds,
double[] priorClassDist,
double[] data,
double threshold)
preds - an array of prediction distribution. Length of arrays is equal to a number of classes+1.priorClassDist - prior class probabilities (used to break ties)data - Test datathreshold - threshold for binary classifierpublic static boolean bitSetContains(byte[] bits,
int bitoff,
int num)
public static int KMeans_closest(double[][] centers,
double[] point,
java.lang.String[][] domains,
double[] means,
double[] mults)
public static double KMeans_distance(double[] center,
float[] point,
java.lang.String[][] domains,
double[] means,
double[] mults,
double[] colSum,
double[] colSumSq)
public static double KMeans_distance(double[] center,
double[] point,
java.lang.String[][] domains,
double[] means,
double[] mults)
public static double[] SharedTree_clean(double[] data)
public static double log_rescale(double[] preds)
public static void GBM_rescale(double[] preds)
public static double GLM_identityInv(double x)
public static double GLM_logitInv(double x)
public static double GLM_logInv(double x)
public static double GLM_inverseInv(double x)
public static double GLM_tweedieInv(double x,
double tweedie_link_power)