Computes TF-IDF values for each word in given documents.
h2o.tf_idf( frame, document_id_col, text_col, preprocess = TRUE, case_sensitive = TRUE )
frame | documents or words frame for which TF-IDF values should be computed. |
---|---|
document_id_col | index or name of a column containing document IDs. |
text_col | index or name of a column containing documents if `preprocess = TRUE` or words if `preprocess = FALSE`. |
preprocess | whether input text data should be pre-processed. Defaults to `TRUE`. |
case_sensitive | whether input data should be treated as case sensitive. Defaults to `TRUE`. |
resulting frame with TF-IDF values. Row format: documentID, word, TF, IDF, TF-IDF