Parameters of H2OIsolationForest¶
Affected Class¶
- ai.h2o.sparkling.ml.algos.H2OIsolationForest
Parameters¶
- Each parameter has also a corresponding getter and setter method. (E.g.: - label->- getLabel(),- setLabel(...))
- calibrationDataFrame
- Calibration frame for Platt Scaling. To enable usage of the data frame, set the parameter calibrateModel to True. - Scala default value: - null; Python default value:- None
- ignoredCols
- Names of columns to ignore for training. - Scala default value: - null; Python default value:- None- Also available on the trained model. 
- buildTreeOneNode
- Run on one node only; no network overhead but fewer cpus used. Suitable for small datasets. - Scala default value: - false; Python default value:- False- Also available on the trained model. 
- categoricalEncoding
- Encoding scheme for categorical features. Possible values are - "AUTO",- "OneHotInternal",- "OneHotExplicit",- "Enum",- "Binary",- "Eigen",- "LabelEncoder",- "SortByResponse",- "EnumLimited".- Default value: - "AUTO"- Also available on the trained model. 
- colSampleRateChangePerLevel
- Relative change of the column sampling rate for every level (must be > 0.0 and <= 2.0). - Default value: - 1.0- Also available on the trained model. 
- colSampleRatePerTree
- Column sample rate per tree (from 0.0 to 1.0). - Default value: - 1.0- Also available on the trained model. 
- columnsToCategorical
- List of columns to convert to categorical before modelling - Scala default value: - Array(); Python default value:- []
- contamination
- Contamination ratio - the proportion of anomalies in the input dataset. If undefined (-1) the predict function will not mark observations as anomalies and only anomaly score will be returned. Defaults to -1 (undefined). - Default value: - -1.0- Also available on the trained model. 
- convertInvalidNumbersToNa
- If set to ‘true’, the model converts invalid numbers to NA during making predictions. - Scala default value: - false; Python default value:- False- Also available on the trained model. 
- convertUnknownCategoricalLevelsToNa
- If set to ‘true’, the model converts unknown categorical levels to NA during making predictions. - Scala default value: - false; Python default value:- False- Also available on the trained model. 
- dataFrameSerializer
- A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam. - Default value: - "ai.h2o.sparkling.utils.JSONDataFrameSerializer"- Also available on the trained model. 
- detailedPredictionCol
- Column containing additional prediction details, its content depends on the model type. - Default value: - "detailed_prediction"- Also available on the trained model. 
- exportCheckpointsDir
- Automatically export generated models to this directory. - Scala default value: - null; Python default value:- None- Also available on the trained model. 
- featuresCols
- Name of feature columns - Scala default value: - Array(); Python default value:- []- Also available on the trained model. 
- ignoreConstCols
- Ignore constant columns. - Scala default value: - true; Python default value:- True- Also available on the trained model. 
- keepBinaryModels
- If set to true, all binary models created during execution of the - fitmethod will be kept in DKV of H2O-3 cluster.- Scala default value: - false; Python default value:- False
- maxDepth
- Maximum tree depth (0 for unlimited). - Default value: - 8- Also available on the trained model. 
- maxRuntimeSecs
- Maximum allowed runtime in seconds for model training. Use 0 to disable. - Default value: - 0.0- Also available on the trained model. 
- minRows
- Fewest allowed (weighted) observations in a leaf. - Default value: - 1.0- Also available on the trained model. 
- modelId
- Destination id for this model; auto-generated if not specified. - Scala default value: - null; Python default value:- None
- mtries
- Number of variables randomly sampled as candidates at each split. If set to -1, defaults (number of predictors)/3. - Default value: - -1- Also available on the trained model. 
- ntrees
- Number of trees. - Default value: - 50- Also available on the trained model. 
- predictionCol
- Prediction column name - Default value: - "prediction"- Also available on the trained model. 
- sampleRate
- Rate of randomly sampled observations used to train each Isolation Forest tree. Needs to be in range from 0.0 to 1.0. If set to -1, sample_rate is disabled and sample_size will be used instead. - Default value: - -1.0- Also available on the trained model. 
- sampleSize
- Number of randomly sampled observations used to train each Isolation Forest tree. Only one of parameters sample_size and sample_rate should be defined. If sample_rate is defined, sample_size will be ignored. - Scala default value: - 256L; Python default value:- 256- Also available on the trained model. 
- scoreEachIteration
- Whether to score during each iteration of model training. - Scala default value: - false; Python default value:- False- Also available on the trained model. 
- scoreTreeInterval
- Score the model after every so many trees. Disabled if set to 0. - Default value: - 0- Also available on the trained model. 
- seed
- Seed for pseudo random number generator (if applicable). - Scala default value: - -1L; Python default value:- -1- Also available on the trained model. 
- splitRatio
- Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set. - Default value: - 1.0
- stoppingMetric
- Metric to use for early stopping (AUTO: logloss for classification, deviance for regression and anomaly_score for Isolation Forest). Note that custom and custom_increasing can only be used in GBM and DRF with the Python client. Possible values are - "AUTO",- "deviance",- "logloss",- "MSE",- "RMSE",- "MAE",- "RMSLE",- "AUC",- "AUCPR",- "lift_top_group",- "misclassification",- "mean_per_class_error",- "anomaly_score",- "custom",- "custom_increasing".- Default value: - "AUTO"- Also available on the trained model. 
- stoppingRounds
- Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable). - Default value: - 0- Also available on the trained model. 
- stoppingTolerance
- Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much). - Default value: - 0.01- Also available on the trained model. 
- validationDataFrame
- A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the ‘splitRatio’ parameter. The parameter is not serializable! - Scala default value: - null; Python default value:- None
- validationLabelCol
- (experimental) Name of the label column in the validation data frame. The label column should be a string column with two distinct values indicating the anomaly. The negative value must be alphabetically smaller than the positive value. (E.g. ‘0’/’1’, ‘False’/’True’ - Default value: - "label"
- withContributions
- Enables or disables generating a sub-column of detailedPredictionCol containing Shapley values of original features. - Scala default value: - false; Python default value:- False- Also available on the trained model. 
- withLeafNodeAssignments
- Enables or disables computation of leaf node assignments. - Scala default value: - false; Python default value:- False- Also available on the trained model. 
- withStageResults
- Enables or disables computation of stage results. - Scala default value: - false; Python default value:- False- Also available on the trained model.