Parameters of H2OWord2Vec¶
Affected Class¶
ai.h2o.sparkling.ml.features.H2OWord2Vec
Parameters¶
Each parameter has also a corresponding getter and setter method. (E.g.:
label
->getLabel()
,setLabel(...)
)
- outputCol
Output column name
Default value:
"H2OWord2Vec_6b94e5462605__output"
Also available on the trained model.
- columnsToCategorical
List of columns to convert to categorical before modelling
Scala default value:
Array()
; Python default value:[]
- convertInvalidNumbersToNa
If set to ‘true’, the model converts invalid numbers to NA during making predictions.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- convertUnknownCategoricalLevelsToNa
If set to ‘true’, the model converts unknown categorical levels to NA during making predictions.
Scala default value:
false
; Python default value:False
Also available on the trained model.
- dataFrameSerializer
A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam.
Default value:
"ai.h2o.sparkling.utils.JSONDataFrameSerializer"
Also available on the trained model.
- epochs
Number of training iterations to run.
Default value:
5
Also available on the trained model.
- exportCheckpointsDir
Automatically export generated models to this directory.
Scala default value:
null
; Python default value:None
Also available on the trained model.
- initLearningRate
Set the starting learning rate.
Scala default value:
0.025f
; Python default value:0.025
Also available on the trained model.
- inputCol
Input column name
Default value:
"No default value"
Also available on the trained model.
- keepBinaryModels
If set to true, all binary models created during execution of the
fit
method will be kept in DKV of H2O-3 cluster.Scala default value:
false
; Python default value:False
- maxRuntimeSecs
Maximum allowed runtime in seconds for model training. Use 0 to disable.
Default value:
0.0
Also available on the trained model.
- minWordFreq
This will discard words that appear less than <int> times.
Default value:
5
Also available on the trained model.
- modelId
Destination id for this model; auto-generated if not specified.
Scala default value:
null
; Python default value:None
- normModel
Use Hierarchical Softmax. Possible values are
"HSM"
.Default value:
"HSM"
Also available on the trained model.
- sentSampleRate
- Set threshold for occurrence of words. Those that appear with higher frequency in the training data
will be randomly down-sampled; useful range is (0, 1e-5).
Scala default value:
0.001f
; Python default value:0.001
Also available on the trained model.
- splitRatio
Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set.
Default value:
1.0
- validationDataFrame
A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the ‘splitRatio’ parameter. The parameter is not serializable!
Scala default value:
null
; Python default value:None
- vecSize
Set size of word vectors.
Default value:
100
Also available on the trained model.
- windowSize
Set max skip length between words.
Default value:
5
Also available on the trained model.
- wordModel
The word model to use (SkipGram or CBOW). Possible values are
"SkipGram"
,"CBOW"
.Default value:
"SkipGram"
Also available on the trained model.