Data Sources

H2O supports data ingest from various data sources. Natively, a local file system, HDFS and S3 are supported. Additional data sources can be accessed through a generic HDFS API, such as Alluxio or OpenStack Swift.

Default Data Sources

  • local file system
  • S3
  • HDFS

HDFS-like Data Sources

Various data sources can be accessed through an HDFS API. In this case, a library providing access to a data source needs to be passed on a command line when H2O is launched (Reminder: Each node in the cluster must be launched in the same way.). The library must be compatible with the HDFS API in order to be registered as a correct HDFS FileSystem.

Alluxio FS

Required Library

To access Alluxio data source, an Alluxio client library that is part of Alluxio distribution is required. For example, alluxio-1.3.0/core/client/target/alluxio-core-client-1.3.0-jar-with-dependencies.jar.

H2O Command Line

java -cp alluxio-core-client-1.3.0-jar-with-dependencies.jar:build/h2o.jar water.H2OApp

URI Scheme

An Alluxio data source is referenced using alluxio:// schema and location of Alluxio master. For example,

alluxio://localhost:19998/iris.csv

core-site.xml Configuration

Not supported.

IBM Swift Object Storage

Required Library

To access IBM Object Store (which can be exposed via Bluemix or Softlayer), IBM’s HDFS driver hadoop-openstack.jar is required. The driver can be obtained, for example, by running BigInsight instances at location /usr/iop/4.2.0.0/hadoop-mapreduce/hadoop-openstack.jar.

Note: The jar available at Maven central is not compatible with IBM Swift Object Storage.

H2O Command Line

java -cp hadoop-openstack.jar:h2o.jar water.H2OApp

URI Scheme

Data source is available under the regular Swift URI structure: swift://<CONTAINER>.<SERVICE>/path/to/file For example,

swift://smalldata.h2o/iris.csv

core-site.xml Configuration

The core-site.xml needs to be configured with Swift Object Store parameters. These are available in the Bluemix/Softlayer management console.

<configuration>
  <property>
    <name>fs.swift.service.SERVICE.auth.url</name>
    <value>https://identity.open.softlayer.com/v3/auth/tokens</value>
  </property>
  <property>
    <name>fs.swift.service.SERVICE.project.id</name>
    <value>...</value>
  </property>
  <property>
    <name>fs.swift.service.SERVICE.user.id</name>
    <value>...</value>
  </property>
  <property>
    <name>fs.swift.service.SERVICE.password</name>
    <value>...</value>
  </property>
  <property>
    <name>fs.swift.service.SERVICE.region</name>
    <value>dallas</value>
  </property>
  <property>
    <name>fs.swift.service.SERVICE.public</name>
    <value>false</value>
  </property>
</configuration>