Parameters of H2OIsolationForest¶
Affected Class¶
ai.h2o.sparkling.ml.algos.H2OIsolationForest
Parameters¶
Each parameter has also a corresponding getter and setter method. (E.g.:
label
->getLabel()
,setLabel(...)
)
- calibrationDataFrame
Calibration frame for Platt Scaling. To enable usage of the data frame, set the parameter calibrateModel to True.
Scala default value:
null
; Python default value:None
- ignoredCols
Names of columns to ignore for training.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- buildTreeOneNode
Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- categoricalEncoding
Encoding scheme for categorical features. Possible values are
"AUTO"
,"OneHotInternal"
,"OneHotExplicit"
,"Enum"
,"Binary"
,"Eigen"
,"LabelEncoder"
,"SortByResponse"
,"EnumLimited"
.Default value:
"AUTO"
Also available on the trained model.
- colSampleRateChangePerLevel
Relative change of the column sampling rate for every level (must be > 0.0 and <= 2.0).
Default value:
1.0
Also available on the trained model.
- colSampleRatePerTree
Column sample rate per tree (from 0.0 to 1.0).
Default value:
1.0
Also available on the trained model.
- columnsToCategorical
List of columns to convert to categorical before modelling
Scala default value:
Array()
; Python default value:[]
- contamination
Contamination ratio - the proportion of anomalies in the input dataset. If undefined (-1) the predict function will not mark observations as anomalies and only anomaly score will be returned. Defaults to -1 (undefined).
Default value:
-1.0
Also available on the trained model.
- convertInvalidNumbersToNa
If set to ‘true’, the model converts invalid numbers to NA during making predictions.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- convertUnknownCategoricalLevelsToNa
If set to ‘true’, the model converts unknown categorical levels to NA during making predictions.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- dataFrameSerializer
A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam.
Default value:
"ai.h2o.sparkling.utils.JSONDataFrameSerializer"
Also available on the trained model.
- detailedPredictionCol
Column containing additional prediction details, its content depends on the model type.
Default value:
"detailed_prediction"
Also available on the trained model.
- exportCheckpointsDir
Automatically export generated models to this directory.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- featuresCols
Name of feature columns
Scala default value:
Array()
; Python default value:[]
Also available on the trained model.
- ignoreConstCols
Ignore constant columns.
Scala default value:
true
; Python default value:True
Also available on the trained model.
- keepBinaryModels
If set to true, all binary models created during execution of the
fit
method will be kept in DKV of H2O-3 cluster.Scala default value:
false
; Python default value:False
- maxDepth
Maximum tree depth (0 for unlimited).
Default value:
8
Also available on the trained model.
- maxRuntimeSecs
Maximum allowed runtime in seconds for model training. Use 0 to disable.
Default value:
0.0
Also available on the trained model.
- minRows
Fewest allowed (weighted) observations in a leaf.
Default value:
1.0
Also available on the trained model.
- modelId
Destination id for this model; auto-generated if not specified.
Scala default value:
null
; Python default value:None
- mtries
Number of variables randomly sampled as candidates at each split. If set to -1, defaults (number of predictors)/3.
Default value:
-1
Also available on the trained model.
- namedMojoOutputColumns
Mojo Output is not stored in the array but in the properly named columns
Scala default value:
true
; Python default value:True
Also available on the trained model.
- ntrees
Number of trees.
Default value:
50
Also available on the trained model.
- predictionCol
Prediction column name
Default value:
"prediction"
Also available on the trained model.
- sampleRate
Rate of randomly sampled observations used to train each Isolation Forest tree. Needs to be in range from 0.0 to 1.0. If set to -1, sample_rate is disabled and sample_size will be used instead.
Default value:
-1.0
Also available on the trained model.
- sampleSize
Number of randomly sampled observations used to train each Isolation Forest tree. Only one of parameters sample_size and sample_rate should be defined. If sample_rate is defined, sample_size will be ignored.
Scala default value:
256L
; Python default value:256
Also available on the trained model.
- scoreEachIteration
Whether to score during each iteration of model training.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- scoreTreeInterval
Score the model after every so many trees. Disabled if set to 0.
Default value:
0
Also available on the trained model.
- seed
Seed for pseudo random number generator (if applicable).
Scala default value:
-1L
; Python default value:-1
Also available on the trained model.
- splitRatio
Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set.
Default value:
1.0
- stoppingMetric
Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anonomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. Possible values are
"AUTO"
,"deviance"
,"logloss"
,"MSE"
,"RMSE"
,"MAE"
,"RMSLE"
,"AUC"
,"AUCPR"
,"lift_top_group"
,"misclassification"
,"mean_per_class_error"
,"anomaly_score"
,"custom"
,"custom_increasing"
.Default value:
"AUTO"
Also available on the trained model.
- stoppingRounds
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable).
Default value:
0
Also available on the trained model.
- stoppingTolerance
Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much).
Default value:
0.01
Also available on the trained model.
- validationDataFrame
A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the ‘splitRatio’ parameter. The parameter is not serializable!
Scala default value:
null
; Python default value:None
- validationLabelCol
(experimental) Name of the label column in the validation data frame. The label column should be a string column with two distinct values indicating the anomaly. The negative value must be alphabetically smaller than the positive value. (E.g. ‘0’/’1’, ‘False’/’True’
Default value:
"label"
- withContributions
Enables or disables generating a sub-column of detailedPredictionCol containing Shapley values of original features.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- withLeafNodeAssignments
Enables or disables computation of leaf node assignments.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- withStageResults
Enables or disables computation of stage results.
Scala default value:
false
; Python default value:False
Also available on the trained model.