.. _parameters_H2OWord2Vec: Parameters of H2OWord2Vec ------------------------- Affected Class ############## - ``ai.h2o.sparkling.ml.features.H2OWord2Vec`` Parameters ########## - *Each parameter has also a corresponding getter and setter method.* *(E.g.:* ``label`` *->* ``getLabel()`` *,* ``setLabel(...)`` *)* outputCol Output column name *Default value:* ``"H2OWord2Vec_d018a33fd165__output"`` *Also available on the trained model.* columnsToCategorical List of columns to convert to categorical before modelling *Scala default value:* ``Array()`` *; Python default value:* ``[]`` convertInvalidNumbersToNa If set to 'true', the model converts invalid numbers to NA during making predictions. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* convertUnknownCategoricalLevelsToNa If set to 'true', the model converts unknown categorical levels to NA during making predictions. *Scala default value:* ``false`` *; Python default value:* ``False`` *Also available on the trained model.* dataFrameSerializer A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam. *Default value:* ``"ai.h2o.sparkling.utils.JSONDataFrameSerializer"`` *Also available on the trained model.* epochs Number of training iterations to run. *Default value:* ``5`` *Also available on the trained model.* exportCheckpointsDir Automatically export generated models to this directory. *Scala default value:* ``null`` *; Python default value:* ``None`` *Also available on the trained model.* initLearningRate Set the starting learning rate. *Scala default value:* ``0.025f`` *; Python default value:* ``0.025`` *Also available on the trained model.* inputCol Input column name *Default value:* ``"No default value"`` *Also available on the trained model.* keepBinaryModels If set to true, all binary models created during execution of the ``fit`` method will be kept in DKV of H2O-3 cluster. *Scala default value:* ``false`` *; Python default value:* ``False`` maxRuntimeSecs Maximum allowed runtime in seconds for model training. Use 0 to disable. *Default value:* ``0.0`` *Also available on the trained model.* minWordFreq This will discard words that appear less than times. *Default value:* ``5`` *Also available on the trained model.* modelId Destination id for this model; auto-generated if not specified. *Scala default value:* ``null`` *; Python default value:* ``None`` normModel Use Hierarchical Softmax. Possible values are ``"HSM"``. *Default value:* ``"HSM"`` *Also available on the trained model.* sentSampleRate Set threshold for occurrence of words. Those that appear with higher frequency in the training data will be randomly down-sampled; useful range is (0, 1e-5). *Scala default value:* ``0.001f`` *; Python default value:* ``0.001`` *Also available on the trained model.* splitRatio Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set. *Default value:* ``1.0`` validationDataFrame A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the 'splitRatio' parameter. The parameter is not serializable! *Scala default value:* ``null`` *; Python default value:* ``None`` vecSize Set size of word vectors. *Default value:* ``100`` *Also available on the trained model.* windowSize Set max skip length between words. *Default value:* ``5`` *Also available on the trained model.* wordModel The word model to use (SkipGram or CBOW). Possible values are ``"SkipGram"``, ``"CBOW"``. *Default value:* ``"SkipGram"`` *Also available on the trained model.*