.. _parameters_H2OWord2Vec:

Parameters of H2OWord2Vec
-------------------------

Affected Class
##############

- ``ai.h2o.sparkling.ml.features.H2OWord2Vec``

Parameters
##########

- *Each parameter has also a corresponding getter and setter method.*
  *(E.g.:* ``label`` *->* ``getLabel()`` *,* ``setLabel(...)`` *)*

outputCol
  Output column name

  *Default value:* ``"H2OWord2Vec_c7799c185f32__output"``
  
  *Also available on the trained model.*

columnsToCategorical
  List of columns to convert to categorical before modelling

  *Scala default value:* ``Array()`` *; Python default value:* ``[]``
  

convertInvalidNumbersToNa
  If set to 'true', the model converts invalid numbers to NA during making predictions.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

convertUnknownCategoricalLevelsToNa
  If set to 'true', the model converts unknown categorical levels to NA during making predictions.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  
  *Also available on the trained model.*

dataFrameSerializer
  A full name of a serializer used for serialization and deserialization of Spark DataFrames to a JSON value within NullableDataFrameParam.

  *Default value:* ``"ai.h2o.sparkling.utils.JSONDataFrameSerializer"``
  
  *Also available on the trained model.*

epochs
  Number of training iterations to run.

  *Default value:* ``5``
  
  *Also available on the trained model.*

exportCheckpointsDir
  Automatically export generated models to this directory.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  
  *Also available on the trained model.*

initLearningRate
  Set the starting learning rate.

  *Scala default value:* ``0.025f`` *; Python default value:* ``0.025``
  
  *Also available on the trained model.*

inputCol
  Input column name

  *Default value:* ``"No default value"``
  
  *Also available on the trained model.*

keepBinaryModels
  If set to true, all binary models created during execution of the ``fit`` method will be kept in DKV of H2O-3 cluster.

  *Scala default value:* ``false`` *; Python default value:* ``False``
  

maxRuntimeSecs
  Maximum allowed runtime in seconds for model training. Use 0 to disable.

  *Default value:* ``0.0``
  
  *Also available on the trained model.*

minWordFreq
  This will discard words that appear less than <int> times.

  *Default value:* ``5``
  
  *Also available on the trained model.*

modelId
  Destination id for this model; auto-generated if not specified.

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

normModel
  Use Hierarchical Softmax. Possible values are ``"HSM"``.

  *Default value:* ``"HSM"``
  
  *Also available on the trained model.*

sentSampleRate
  Set threshold for occurrence of words. Those that appear with higher frequency in the training data
		will be randomly down-sampled; useful range is (0, 1e-5).

  *Scala default value:* ``0.001f`` *; Python default value:* ``0.001``
  
  *Also available on the trained model.*

splitRatio
  Accepts values in range [0, 1.0] which determine how large part of dataset is used for training and for validation. For example, 0.8 -> 80% training 20% validation. This parameter is ignored when validationDataFrame is set.

  *Default value:* ``1.0``
  

validationDataFrame
  A data frame dedicated for a validation of the trained model. If the parameters is not set,a validation frame created via the 'splitRatio' parameter. The parameter is not serializable!

  *Scala default value:* ``null`` *; Python default value:* ``None``
  

vecSize
  Set size of word vectors.

  *Default value:* ``100``
  
  *Also available on the trained model.*

windowSize
  Set max skip length between words.

  *Default value:* ``5``
  
  *Also available on the trained model.*

wordModel
  The word model to use (SkipGram or CBOW). Possible values are ``"SkipGram"``, ``"CBOW"``.

  *Default value:* ``"SkipGram"``
  
  *Also available on the trained model.*