R/word2vec.R
h2o.word2vec.Rd
Trains a word2vec model on a String column of an H2O data frame
h2o.word2vec(training_frame = NULL, model_id = NULL, min_word_freq = 5, word_model = c("SkipGram"), norm_model = c("HSM"), vec_size = 100, window_size = 5, sent_sample_rate = 0.001, init_learning_rate = 0.025, epochs = 5, pre_trained = NULL, max_runtime_secs = 0, export_checkpoints_dir = NULL)
training_frame | Id of the training data frame. |
---|---|
model_id | Destination id for this model; auto-generated if not specified. |
min_word_freq | This will discard words that appear less than <int> times Defaults to 5. |
word_model | Use the Skip-Gram model Must be one of: "SkipGram". Defaults to SkipGram. |
norm_model | Use Hierarchical Softmax Must be one of: "HSM". Defaults to HSM. |
vec_size | Set size of word vectors Defaults to 100. |
window_size | Set max skip length between words Defaults to 5. |
sent_sample_rate | Set threshold for occurrence of words. Those that appear with higher frequency in the training data will be randomly down-sampled; useful range is (0, 1e-5) Defaults to 0.001. |
init_learning_rate | Set the starting learning rate Defaults to 0.025. |
epochs | Number of training iterations to run Defaults to 5. |
pre_trained | Id of a data frame that contains a pre-trained (external) word2vec model |
max_runtime_secs | Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0. |
export_checkpoints_dir | Automatically export generated models to this directory. |