Trains a word2vec model on a String column of an H2O data frame

h2o.word2vec(
  training_frame = NULL,
  model_id = NULL,
  min_word_freq = 5,
  word_model = c("SkipGram"),
  norm_model = c("HSM"),
  vec_size = 100,
  window_size = 5,
  sent_sample_rate = 0.001,
  init_learning_rate = 0.025,
  epochs = 5,
  pre_trained = NULL,
  max_runtime_secs = 0,
  export_checkpoints_dir = NULL
)

Arguments

training_frame	Id of the training data frame.
model_id	Destination id for this model; auto-generated if not specified.
min_word_freq	This will discard words that appear less than <int> times Defaults to 5.
word_model	Use the Skip-Gram model Must be one of: "SkipGram". Defaults to SkipGram.
norm_model	Use Hierarchical Softmax Must be one of: "HSM". Defaults to HSM.
vec_size	Set size of word vectors Defaults to 100.
window_size	Set max skip length between words Defaults to 5.
sent_sample_rate	Set threshold for occurrence of words. Those that appear with higher frequency in the training data will be randomly down-sampled; useful range is (0, 1e-5) Defaults to 0.001.
init_learning_rate	Set the starting learning rate Defaults to 0.025.
epochs	Number of training iterations to run Defaults to 5.
pre_trained	Id of a data frame that contains a pre-trained (external) word2vec model
max_runtime_secs	Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0.
export_checkpoints_dir	Automatically export generated models to this directory.

Trains a word2vec model on a String column of an H2O data frame

Arguments

Contents