Import Hive table to H2OFrame in memory. Make sure to start H2O with Hive on classpath. Uses hive-site.xml on classpath to connect to Hive. When database is specified as jdbc URL uses Hive JDBC driver to obtain table metadata. then uses direct HDFS access to import data.
h2o.import_hive_table( database, table, partitions = NULL, allow_multi_format = FALSE )
database | Name of Hive database (default database will be used by default), can be also a JDBC URL |
---|---|
table | name of Hive table to import |
partitions | a list of lists of strings - partition key column values of partitions you want to import. |
allow_multi_format | enable import of partitioned tables with different storage formats used. WARNING: this may fail on out-of-memory for tables with a large number of small partitions. |
For example, my_citibike_data = h2o.import_hive_table("default", "citibike20k", partitions = list(c("2017", "01"), c("2017", "02"))) my_citibike_data = h2o.import_hive_table("jdbc:hive2://hive-server:10000/default", "citibike20k", allow_multi_format = TRUE)