Data Sources¶
H2O supports data ingest from various data sources. Natively, a local file system, HDFS and S3 are supported. Additional data sources can be accessed through a generic HDFS API, such as Alluxio or OpenStack Swift.
Default Data Sources¶
- local file system
- S3
- HDFS
HDFS-like Data Sources¶
Various data sources can be accessed through an HDFS API.
In this case, a library providing access to a data source needs to be passed on a command line when H2O is launched
(Reminder: Each node in the cluster must be launched in the same way.).
The library must be compatible with the HDFS API in order to be registered as a correct HDFS FileSystem
.
Alluxio FS¶
Required Library
To access Alluxio data source, an Alluxio client library that is part of Alluxio distribution is required.
For example, alluxio-1.3.0/core/client/target/alluxio-core-client-1.3.0-jar-with-dependencies.jar
.
H2O Command Line
java -cp alluxio-core-client-1.3.0-jar-with-dependencies.jar:build/h2o.jar water.H2OApp
URI Scheme
An Alluxio data source is referenced using alluxio://
schema and location of Alluxio master.
For example,
alluxio://localhost:19998/iris.csv
core-site.xml Configuration
Not supported.
IBM Swift Object Storage¶
Required Library
To access IBM Object Store (which can be exposed via Bluemix or Softlayer), IBM’s HDFS driver hadoop-openstack.jar
is required.
The driver can be obtained, for example, by running BigInsight instances at location /usr/iop/4.2.0.0/hadoop-mapreduce/hadoop-openstack.jar
.
Note: The jar available at Maven central is not compatible with IBM Swift Object Storage.
H2O Command Line
java -cp hadoop-openstack.jar:h2o.jar water.H2OApp
URI Scheme
Data source is available under the regular Swift URI structure: swift://<CONTAINER>.<SERVICE>/path/to/file
For example,
swift://smalldata.h2o/iris.csv
core-site.xml Configuration
The core-site.xml needs to be configured with Swift Object Store parameters. These are available in the Bluemix/Softlayer management console.
<configuration>
<property>
<name>fs.swift.service.SERVICE.auth.url</name>
<value>https://identity.open.softlayer.com/v3/auth/tokens</value>
</property>
<property>
<name>fs.swift.service.SERVICE.project.id</name>
<value>...</value>
</property>
<property>
<name>fs.swift.service.SERVICE.user.id</name>
<value>...</value>
</property>
<property>
<name>fs.swift.service.SERVICE.password</name>
<value>...</value>
</property>
<property>
<name>fs.swift.service.SERVICE.region</name>
<value>dallas</value>
</property>
<property>
<name>fs.swift.service.SERVICE.public</name>
<value>false</value>
</property>
</configuration>