FileVec (h2o-core version 3.10.0.3 API)

java.lang.Object
- water.Iced<T>
- - water.Keyed<Vec>
  - - water.fvec.Vec
    - - water.fvec.ByteVec
      - water.fvec.FileVec

All Implemented Interfaces:

java.io.Externalizable, java.io.Serializable, java.lang.Cloneable, Freezable<Vec>

Direct Known Subclasses:

HDFSFileVec, NFSFileVec, S3FileVec, UploadFileVec
```
public abstract class FileVec
extends ByteVec
```
See Also:
Serialized Form

Nested Class Summary
- Nested classes/interfaces inherited from class water.fvec.Vec
  Vec.ESPC, Vec.Reader, Vec.VectorGroup, Vec.Writer

Field Summary

Fields
Modifier and Type	Field and Description
`int`	`_chunkSize`
`static int`	`DFLT_CHUNK_SIZE` Default Chunk size in bytes, useful when breaking up large arrays into "bite-sized" chunks.
`static int`	`DFLT_LOG2_CHUNK_SIZE` Log-2 of Chunk size.

Fields inherited from class water.fvec.Vec
_rowLayout, DO_HISTOGRAMS, KEY_PREFIX_LEN, PERCENTILES, T_BAD, T_CAT, T_NUM, T_STR, T_TIME, T_UUID, TYPE_STR

Fields inherited from class water.Keyed
_key

Constructor Summary

Constructors
Modifier Constructor and Description

protected FileVec(Key key, long len, byte be)

Constructors
Modifier	Constructor and Description
`protected`	`FileVec(Key key, long len, byte be)`

Method Summary

Methods
Modifier and Type	Method and Description
`long`	`byteSize()` Size of vector data.
`static int`	`calcOptimalChunkSize(long totalSize, int numCols, long maxLineLength, int cores, int cloudsize, boolean oldHeuristic, boolean verbose)` Calculates safe and hopefully optimal chunk sizes.
`Value`	`chunkIdx(int cidx)` Get a Chunk's Value by index.
`static long`	`chunkOffset(Key ckey)` Convert a chunk-key to a file offset.
`int`	`elem2ChunkIdx(long i)` Convert a row# to a chunk#.
`long`	`length()` Number of elements in the vector; returned as a `long` instead of an `int` because Vecs support more than 2^32 elements.
`int`	`nChunks()` Number of chunks, returned as an `int` - Chunk count is limited by the max size of a Java `long[]`.
`int`	`setChunkSize(Frame fr, int chunkSize)`
`int`	`setChunkSize(int chunkSize)` Chunk size must be positive, 1G or less, and a power of two.
`boolean`	`writable()` Default read/write behavior for Vecs.

Methods inherited from class water.fvec.ByteVec
chunkForChunkIdx, getFirstBytes, getPreviewChunkBytes, isInt, naCnt, openStream

Methods inherited from class water.fvec.Vec
adaptTo, align, at, at16h, at16l, at8, atStr, base, bins, cardinality, checksum_impl, chunkForRow, chunkKey, chunkKey, copyMeta, doCopy, domain, equals, espc, factor, get_type_str, get_type, getVecKey, group, hashCode, isBad, isBinary, isCategorical, isConst, isNA, isNumeric, isString, isTime, isUUID, lazy_bins, makeCon, makeCon, makeCon, makeCon, makeCon, makeCon, makeCon, makeCon, makeCons, makeCons, makeCopy, makeCopy, makeCopy, makeDoubles, makeRand, makeRepSeq, makeSeq, makeSeq, makeSeq, makeVec, makeVec, makeVec, makeZero, makeZero, makeZero, makeZero, makeZeros, makeZeros, max, maxs, mean, min, mins, mode, newKey, ninfs, nzCnt, open, pctiles, pinfs, postWrite, preWriting, readAll_impl, remove_impl, rollupStatsKey, set, set, set, set, setBad, setDomain, sigma, sparseRatio, startRollupStats, startRollupStats, stride, toCategoricalVec, toNumericVec, toString, toStringVec, writeAll_impl

Methods inherited from class water.Keyed
checksum, makeSchema, readAll, remove, remove, remove, remove, writeAll

Methods inherited from class water.Iced
asBytes, clone, copyOver, frozenType, read, readExternal, readJSON, reloadFromBytes, toJsonString, write, writeExternal, writeJSON

Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait

- Field Detail
  - DFLT_LOG2_CHUNK_SIZE
```
public static final int DFLT_LOG2_CHUNK_SIZE
```
    Log-2 of Chunk size.
    
    See Also:
    Constant Field Values
  - DFLT_CHUNK_SIZE
```
public static final int DFLT_CHUNK_SIZE
```
    Default Chunk size in bytes, useful when breaking up large arrays into "bite-sized" chunks. Bigger increases batch sizes, lowers overhead costs, lower increases fine-grained parallelism.
    
    See Also:
    Constant Field Values
  - _chunkSize
```
public int _chunkSize
```
- Constructor Detail
  - FileVec
```
protected FileVec(Key key,
       long len,
       byte be)
```
- Method Detail
  - setChunkSize
```
public int setChunkSize(int chunkSize)
```
    Chunk size must be positive, 1G or less, and a power of two. Any values that aren't a power of two will be reduced to the first power of two lower than the provided chunkSize.
    Since, optimal chunk size is not known during FileVec instantiation, setter is required to both set it, and keep it in sync with _log2ChkSize.
    
    Parameters:
    chunkSize - requested chunk size to be used when parsing
    
    Returns:
    actual _chunkSize setting
  - setChunkSize
```
public int setChunkSize(Frame fr,
               int chunkSize)
```
  - length
```
public long length()
```
    Description copied from class: Vec
    
    Number of elements in the vector; returned as a long instead of an int because Vecs support more than 2^32 elements. Overridden by subclasses that compute length in an alternative way, such as file-backed Vecs.
    
    Overrides:
    
    length in class Vec
    
    Returns:
    Number of elements in the vector
  - nChunks
```
public int nChunks()
```
    Description copied from class: Vec
    
    Number of chunks, returned as an int - Chunk count is limited by the max size of a Java long[]. Overridden by subclasses that compute chunks in an alternative way, such as file-backed Vecs.
    
    Overrides:
    
    nChunks in class Vec
    
    Returns:
    Number of chunks
  - writable
```
public boolean writable()
```
    Description copied from class: Vec
    
    Default read/write behavior for Vecs. AppendableVecs are write-only.
  - byteSize
```
public long byteSize()
```
    Size of vector data.
    
    Overrides:
    
    byteSize in class Vec
  - elem2ChunkIdx
```
public int elem2ChunkIdx(long i)
```
    Description copied from class: Vec
    
    Convert a row# to a chunk#. For constant-sized chunks this is a little shift-and-add math. For variable-sized chunks this is a binary search, with a sane API (JDK has an insane API). Overridden by subclasses that compute chunks in an alternative way, such as file-backed Vecs.
    
    Overrides:
    
    elem2ChunkIdx in class Vec
  - chunkOffset
```
public static long chunkOffset(Key ckey)
```
    Convert a chunk-key to a file offset. Size 1-byte "rows", so this is a direct conversion.
    
    Returns:
    The file offset corresponding to this Chunk index
  - chunkIdx
```
public Value chunkIdx(int cidx)
```
    Description copied from class: Vec
    
    Get a Chunk's Value by index. Basically the index-to-key map, plus the DKV.get(). Warning: this pulls the data locally; using this call on every Chunk index on the same node will probably trigger an OOM!
    
    Overrides:
    
    chunkIdx in class Vec
  - calcOptimalChunkSize
```
public static int calcOptimalChunkSize(long totalSize,
                       int numCols,
                       long maxLineLength,
                       int cores,
                       int cloudsize,
                       boolean oldHeuristic,
                       boolean verbose)
```
    Calculates safe and hopefully optimal chunk sizes. Four cases exist.
    very small data < 64K per core - uses default chunk size and all data will be in one chunk
    small data - data is partitioned into chunks that at least 4 chunks per core to help keep all cores loaded
    default - chunks are 4194304
    large data - if the data would create more than 2M keys per node, then chunk sizes larger than DFLT_CHUNK_SIZE are issued.
    Too many keys can create enough overhead to blow out memory in large data parsing. # keys = (parseSize / chunkSize) * numCols. Key limit of 2M is a guessed "reasonable" number.
    
    Parameters:
    totalSize - - parse size in bytes (across all files to be parsed)
    numCols - - number of columns expected in dataset
    cores - - number of processing cores per node
    cloudsize - - number of compute nodes
    verbose - - print the parse heuristics
    
    Returns:
    - optimal chunk size in bytes (always a power of 2).

Class FileVec

Nested Class Summary

Nested classes/interfaces inherited from class water.fvec.Vec

Field Summary

Fields inherited from class water.fvec.Vec

Fields inherited from class water.Keyed

Constructor Summary

Method Summary

Methods inherited from class water.fvec.ByteVec

Methods inherited from class water.fvec.Vec

Methods inherited from class water.Keyed

Methods inherited from class water.Iced

Methods inherited from class java.lang.Object

Field Detail

DFLT_LOG2_CHUNK_SIZE

DFLT_CHUNK_SIZE

_chunkSize

Constructor Detail

FileVec

Method Detail

setChunkSize

setChunkSize

length

nChunks

writable

byteSize

elem2ChunkIdx

chunkOffset

chunkIdx

calcOptimalChunkSize