REST Interface

Classes for communication with backend H2O servers.

H2OConnection
Connect to an existing H2O server and send requests to it.
H2OLocalServer
Start an H2O server on your local machine.

The h2o module has convenience functions for accessing these classes, and those are the ones that are recommended for everyday use. The following are the common use cases:

  1. Connect to an existing remote H2O server:

    h2o.connect(url="...")
    
  2. Connect to a local server, or if there isn’t one start it and then connect:

    h2o.init()
    
  3. Start multiple H2O servers locally (forming a cluster), and then connect to one of them:

    from h2o.backend import H2OLocalServer
    for _ in range(5):
        hs = H2OLocalServer.start()
    h2o.connect(server=hs)
    

Functions h2o.connect() and h2o.init() take many parameters that allow you to fine-tune the connection settings. When used, they will create a new H2OConnection object and store it in a global variable – this connection will be used by all subsequent calls to h2o. functions. At this moment there is no effective way to have multiple connections to separate H2O servers open at the same time. Such facility may be added in the future.

class h2o.backend.H2OConnection[source]

Connection handle to an H2O cluster.

In a typical scenario you don’t need to access this class directly. Instead use h2o.connect() to establish a connection, and h2o.api() to make requests to the backend H2O server. However if your use-case is not typical, then read on.

Instances of this class may only be created through a static method open():

hc = H2OConnection.open(...)

Once opened, the connection remains active until the script exits (or until you explicitly close() it). If the script exits with an exception, then the connection will fail to close, and the backend server will keep all the temporary frames and the open session.

Alternatively you can use this class as a context manager, which will ensure that the connection gets closed at the end of the with ... block even if an exception occurs:

with H2OConnection.open() as hc:
    hc.info().pprint()

Once the connection is established, you can send REST API requests to the server using request().

static open(server=None, url=None, ip=None, port=None, https=None, auth=None, verify_ssl_certificates=True, proxy=None, cluster_name=None, verbose=True)[source]

Establish connection to an existing H2O server.

The connection is not kept alive, so what this method actually does is it attempts to connect to the specified server, and checks that the server is healthy and responds to REST API requests. If the H2O server cannot be reached, an H2OConnectionError will be raised. On success this method returns a new H2OConnection object, and it is the only “official” way to create instances of this class.

There are 3 ways to specify the target to connect to (these settings are mutually exclusive):

  • pass a server option,
  • pass the full url for the connection,
  • provide a triple of parameters ip, port, https.
Parameters:
  • server (H2OLocalServer) – connect to the specified local server instance. There is a slight difference between connecting to a local server by specifying its ip and address, and connecting through an H2OLocalServer instance: if the server becomes unresponsive, then having access to its process handle will allow us to query the server status through OS, and potentially provide snapshot of the server’s error log in the exception information.
  • url – full url of the server to connect to.
  • ip – target server’s IP address or hostname (default “localhost”).
  • port – H2O server’s port (default 54321).
  • https – if True then connect using https instead of http (default False).
  • verify_ssl_certificates – if False then SSL certificate checking will be disabled (default True). This setting should rarely be disabled, as it makes your connection vulnerable to man-in-the-middle attacks. When used, it will generate a warning from the requests library. Has no effect when https is False.
  • auth – authentication token for connecting to the remote server. This can be either a (username, password) tuple, or an authenticator (AuthBase) object. Please refer to the documentation in the requests.auth module.
  • proxy – url address of a proxy server. If you do not specify the proxy, then the requests module will attempt to use a proxy specified in the environment (in HTTP_PROXY / HTTPS_PROXY variables). We check for the presence of these variables and issue a warning if they are found. In order to suppress that warning and use proxy from the environment, pass proxy="(default)".
  • cluster_name – name of the H2O cluster to connect to. This option is used from Steam only.
  • verbose – if True, then connection progress info will be printed to the stdout.
Returns:

A new H2OConnection instance.

Raises:
  • H2OConnectionError – if the server cannot be reached.
  • H2OServerError – if the server is in an unhealthy state (although this might be a recoverable error, the client itself should decide whether it wants to retry or not).
request(endpoint, data=None, json=None, filename=None)[source]

Perform a REST API request to the backend H2O server.

Parameters:
  • endpoint – (str) The endpoint’s URL, for example “GET /4/schemas/KeyV4”
  • data – data payload for POST (and sometimes GET) requests. This should be a dictionary of simple key/value pairs (values can also be arrays), which will be sent over in x-www-form-encoded format.
  • json – also data payload, but it will be sent as a JSON body. Cannot be used together with data.
  • filename – file to upload to the server. Cannot be used with data or json.
Returns:

an H2OResponse object representing the server’s response

Raises:
  • H2OConnectionError – if the H2O server cannot be reached (or connection is not initialized)
  • H2OServerError – if there was a server error (http 500), or server returned malformed JSON
  • H2OResponseError – if the server returned an H2OErrorV3 response (e.g. if the parameters were invalid)
info(refresh=False)[source]

Information about the current state of the connection, or None if it has not been initialized properly.

Parameters:refresh – If False, then retrieve the latest known info; if True then fetch the newest info from the server. Usually you want refresh to be True, except right after establishing a connection when it is still fresh.
Returns:H2OCluster object.
close()[source]

Close an existing connection; once closed it cannot be used again.

Strictly speaking it is not necessary to close all connection that you opened – we have several mechanisms in place that will do so automatically (__del__(), __exit__() and atexit() handlers), however there is also no good reason to make this method private.

session_id

Return the session id of the current connection.

The session id is issued (through an API request) the first time it is requested, but no sooner. This is because generating a session id puts it into the DKV on the server, which effectively locks the cloud. Once issued, the session id will stay the same until the connection is closed.

base_url

Base URL of the server, without trailing "/". For example: "https://example.com:54321".

proxy

URL of the proxy server used for the connection (or None if there is no proxy).

requests_count

Total number of request requests made since the connection was opened (used for debug purposes).

timeout_interval

Timeout length for each request, in seconds.

shutdown_server(prompt)[source]

Shut down the specified server.

This method checks if H2O is running at the specified IP address and port, and if it is, shuts down that H2O instance. All data will be lost.

Parameters:prompt – A logical value indicating whether to prompt the user before shutting down the H2O server.
cluster_is_up()[source]

Determine if an H2O cluster is running or not.

Returns:True if the cluster is up; False otherwise
start_logging(dest=None)[source]

Start logging all API requests to the provided destination.

Parameters:dest – Where to write the log: either a filename (str), or an open file handle (file). If not given, then a new temporary file will be created.
stop_logging()[source]

Stop logging API requests.

class h2o.backend.H2OLocalServer[source]

Handle to an H2O server launched locally.

Public interface:

hs = H2OLocalServer.start(...)  # launch a new local H2O server
hs.is_running()                 # check if the server is running
hs.shutdown()                   # shut down the server
hs.scheme                       # either "http" or "https"
hs.ip                           # ip address of the server, typically "127.0.0.1"
hs.port                         # port on which the server is listening

Once started, the server will run until the script terminates, or until you call .shutdown() on it. Moreover, if the server terminates with an exception, then the server will not stop and will continue to run even after Python process exits. This runaway process may end up being in a bad shape (e.g. frozen), then the only way to terminate it is to kill the java process from the terminal.

Alternatively, it is possible to start the server as a context manager, in which case it will be automatically shut down even if an exception occurs in Python (but not if the Python process is killed):

with H2OLocalServer.start() as hs:
    # do something with the server -- probably connect to it
static start(jar_path=None, nthreads=-1, enable_assertions=True, max_mem_size=None, min_mem_size=None, ice_root=None, port=u'54321+', verbose=True)[source]

Start new H2O server on the local machine.

Parameters:
  • jar_path – Path to the h2o.jar executable. If not given, then we will search for h2o.jar in the locations returned by ._jar_paths().
  • nthreads – Number of threads in the thread pool. This should be related to the number of CPUs used. -1 means use all CPUs on the host. A positive integer specifies the number of CPUs directly.
  • enable_assertions – If True, pass -ea option to the JVM.
  • max_mem_size – Maximum heap size (jvm option Xmx), in bytes.
  • min_mem_size – Minimum heap size (jvm option Xms), in bytes.
  • ice_root – A directory where H2O stores its temporary files. Default location is determined by tempfile.mkdtemp().
  • port – Port where to start the new server. This could be either an integer, or a string of the form “DDDDD+”, indicating that the server should start looking for an open port starting from DDDDD and up.
  • verbose – If True, then connection info will be printed to the stdout.
Returns:

a new H2OLocalServer instance

is_running()[source]

Return True if the server process is still running, False otherwise.

shutdown()[source]

Shut down the server by trying to terminate/kill its process.

First we attempt to terminate the server process gracefully (sending SIGTERM signal). However after _TIME_TO_KILL seconds if the process didn’t shutdown, we forcefully kill it with a SIGKILL signal.

scheme

Connection scheme, ‘http’ or ‘https’.

ip

IP address of the server.

port

Port that the server is listening to.