REST Interface

Classes for communication with backend H2O servers.

H2OConnection

Connect to an existing H2O server and send requests to it.

H2OLocalServer

Start an H2O server on your local machine.

H2OCluster

Handle to the remote H2O cluster – used mainly to retrieve information about it.

The h2o module has convenience functions for accessing these classes, and those are the ones that are recommended for everyday use. The following are the common use cases:

  1. Connect to an existing remote H2O server:

    h2o.connect(url="...")
    
  2. Connect to a local server, or if there isn’t one start it and then connect:

    h2o.init()
    
  3. Start multiple H2O servers locally (forming a cluster), and then connect to one of them:

    from h2o.backend import H2OLocalServer
    for _ in range(5):
        hs = H2OLocalServer.start()
    h2o.connect(server=hs)
    

Functions h2o.connect() and h2o.init() take many parameters that allow you to fine-tune the connection settings. When used, they will create a new H2OConnection object and store it in a global variable – this connection will be used by all subsequent calls to h2o. functions. At this moment there is no effective way to have multiple connections to separate H2O servers open at the same time. Such facility may be added in the future.

class h2o.backend.H2OConnection(*args, **kwargs)[source]

Connection handle to an H2O cluster.

In a typical scenario you don’t need to access this class directly. Instead use h2o.connect() to establish a connection, and h2o.api() to make requests to the backend H2O server. However if your use-case is not typical, then read on.

Instances of this class may only be created through a static method open():

hc = H2OConnection.open(...)

Once opened, the connection remains active until the script exits (or until you explicitly close() it). If the script exits with an exception, then the connection will fail to close, and the backend server will keep all the temporary frames and the open session.

Alternatively you can use this class as a context manager, which will ensure that the connection gets closed at the end of the with ... block even if an exception occurs:

with H2OConnection.open() as hc:
    hc.info().pprint()

Once the connection is established, you can send REST API requests to the server using request().

static open(server=None, url=None, ip=None, port=None, name=None, https=None, auth=None, verify_ssl_certificates=True, cacert=None, proxy=None, cookies=None, verbose=True, _msgs=None)[source]

Establish connection to an existing H2O server.

The connection is not kept alive, so what this method actually does is it attempts to connect to the specified server, and checks that the server is healthy and responds to REST API requests. If the H2O server cannot be reached, an H2OConnectionError will be raised. On success this method returns a new H2OConnection object, and it is the only “official” way to create instances of this class.

There are 3 ways to specify the target to connect to (these settings are mutually exclusive):

  • pass a server option,

  • pass the full url for the connection,

  • provide a triple of parameters ip, port, https.

Parameters
  • server (H2OLocalServer) – connect to the specified local server instance. There is a slight difference between connecting to a local server by specifying its ip and address, and connecting through an H2OLocalServer instance: if the server becomes unresponsive, then having access to its process handle will allow us to query the server status through OS, and potentially provide snapshot of the server’s error log in the exception information.

  • url – full url of the server to connect to.

  • ip – target server’s IP address or hostname (default “localhost”).

  • port – H2O server’s port (default 54321).

  • name – H2O cluster name.

  • https – if True then connect using https instead of http (default False).

  • verify_ssl_certificates – if False then SSL certificate checking will be disabled (default True). This setting should rarely be disabled, as it makes your connection vulnerable to man-in-the-middle attacks. When used, it will generate a warning from the requests library. Has no effect when https is False.

  • cacert – Path to a CA bundle file or a directory with certificates of trusted CAs (optional).

  • auth – authentication token for connecting to the remote server. This can be either a (username, password) tuple, or an authenticator (AuthBase) object. Please refer to the documentation in the requests.auth module.

  • proxy – url address of a proxy server. If you do not specify the proxy, then the requests module will attempt to use a proxy specified in the environment (in HTTP_PROXY / HTTPS_PROXY variables). We check for the presence of these variables and issue a warning if they are found. In order to suppress that warning and use proxy from the environment, pass proxy="(default)".

  • cookies – Cookie (or list of) to add to requests

  • verbose – if True, then connection progress info will be printed to the stdout.

  • _msgs – custom messages to display during connection. This is a tuple (initial message, success message, failure message).

Returns

A new H2OConnection instance.

Raises
  • H2OConnectionError – if the server cannot be reached.

  • H2OServerError – if the server is in an unhealthy state (although this might be a recoverable error, the client itself should decide whether it wants to retry or not).

request(endpoint, data=None, json=None, filename=None, save_to=None)[source]

Perform a REST API request to the backend H2O server.

Parameters
  • endpoint – (str) The endpoint’s URL, for example “GET /4/schemas/KeyV4”

  • data – data payload for POST (and sometimes GET) requests. This should be a dictionary of simple key/value pairs (values can also be arrays), which will be sent over in x-www-form-encoded format.

  • json – also data payload, but it will be sent as a JSON body. Cannot be used together with data.

  • filename – file to upload to the server. Cannot be used with data or json.

  • save_to – if provided, will write the response to that file (additionally, the response will be streamed, so large files can be downloaded seamlessly). This parameter can be either a file name, or a folder name. If the folder doesn’t exist, it will be created automatically.

Returns

an H2OResponse object representing the server’s response (unless save_to parameter is provided, in which case the output file’s name will be returned).

Raises
  • H2OConnectionError – if the H2O server cannot be reached (or connection is not initialized)

  • H2OServerError – if there was a server error (http 500), or server returned malformed JSON

  • H2OResponseError – if the server returned an H2OErrorV3 response (e.g. if the parameters were invalid)

close()[source]

Close an existing connection; once closed it cannot be used again.

Strictly speaking it is not necessary to close all connection that you opened – we have several mechanisms in place that will do so automatically (__del__(), __exit__() and atexit() handlers), however there is also no good reason to make this method private.

property session_id

Return the session id of the current connection.

The session id is issued (through an API request) the first time it is requested, but no sooner. This is because generating a session id puts it into the DKV on the server, which effectively locks the cluster. Once issued, the session id will stay the same until the connection is closed.

property cluster

H2OCluster object describing the underlying cluster.

property base_url

Base URL of the server, without trailing "/". For example: "https://example.com:54321".

property proxy

URL of the proxy server used for the connection (or None if there is no proxy).

property local_server

Handler to the H2OLocalServer instance (if connected to one).

property requests_count

Total number of request requests made since the connection was opened (used for debug purposes).

property timeout_interval

Timeout length for each request, in seconds.

start_logging(dest=None)[source]

Start logging all API requests to the provided destination.

Parameters

dest – Where to write the log: either a filename (str), or an open file handle (file). If not given, then a new temporary file will be created.

stop_logging()[source]

Stop logging API requests.

class h2o.backend.H2OLocalServer[source]

Handle to an H2O server launched locally.

Public interface:

hs = H2OLocalServer.start(...)  # launch a new local H2O server
hs.is_running()                 # check if the server is running
hs.shutdown()                   # shut down the server
hs.scheme                       # either "http" or "https"
hs.ip                           # ip address of the server, typically "127.0.0.1"
hs.port                         # port on which the server is listening

Once started, the server will run until the script terminates, or until you call .shutdown() on it. Moreover, if the server terminates with an exception, then the server will not stop and will continue to run even after Python process exits. This runaway process may end up being in a bad shape (e.g. frozen), then the only way to terminate it is to kill the java process from the terminal.

Alternatively, it is possible to start the server as a context manager, in which case it will be automatically shut down even if an exception occurs in Python (but not if the Python process is killed):

with H2OLocalServer.start() as hs:
    # do something with the server -- probably connect to it
static start(jar_path=None, nthreads=-1, enable_assertions=True, max_mem_size=None, min_mem_size=None, ice_root=None, log_dir=None, log_level=None, max_log_file_size=None, port='54321+', name=None, extra_classpath=None, verbose=True, jvm_custom_args=None, bind_to_localhost=True)[source]

Start new H2O server on the local machine.

Parameters
  • jar_path – Path to the h2o.jar executable. If not given, then we will search for h2o.jar in the locations returned by ._jar_paths().

  • nthreads – Number of threads in the thread pool. This should be related to the number of CPUs used. -1 means use all CPUs on the host. A positive integer specifies the number of CPUs directly.

  • enable_assertions – If True, pass -ea option to the JVM.

  • max_mem_size – Maximum heap size (jvm option Xmx), in bytes.

  • min_mem_size – Minimum heap size (jvm option Xms), in bytes.

  • log_dir – Directory for H2O logs to be stored if a new instance is started. Default directory is determined by H2O internally.

  • log_level – The logger level for H2O if a new instance is started.

  • max_log_file_size – Maximum size of INFO and DEBUG log files. The file is rolled over after a specified size has been reached. (The default is 3MB. Minimum is 1MB and maximum is 99999MB)

  • ice_root – A directory where H2O stores its temporary files. Default location is determined by tempfile.mkdtemp().

  • port – Port where to start the new server. This could be either an integer, or a string of the form “DDDDD+”, indicating that the server should start looking for an open port starting from DDDDD and up.

  • name – name of the h2o cluster to be started

  • extra_classpath – List of paths to libraries that should be included on the Java classpath.

  • verbose – If True, then connection info will be printed to the stdout.

  • jvm_custom_args – Custom, user-defined arguments for the JVM H2O is instantiated in

  • bind_to_localhost – A flag indicating whether access to the H2O instance should be restricted to the local machine (default) or if it can be reached from other computers on the network. Only applicable when H2O is started from the Python client.

Returns

a new H2OLocalServer instance

is_running()[source]

Return True if the server process is still running, False otherwise.

shutdown()[source]

Shut down the server by trying to terminate/kill its process.

First we attempt to terminate the server process gracefully (sending SIGTERM signal). However after _TIME_TO_KILL seconds if the process didn’t shutdown, we forcefully kill it with a SIGKILL signal.

property scheme

Connection scheme, ‘http’ or ‘https’.

property ip

IP address of the server.

property port

Port that the server is listening to.

property name

H2O cluster name.