public class Vec extends Iced
A distributed vector has a count of elements, an element-to-chunk mapping, a Java type (mostly determines rounding on store and display), and functions to directly load elements without further indirections. The data is compressed, or backed by disk or both. *Writing* to elements may throw if the backing data is read-only (file backed).
Vec Key format is: Key. VEC - byte, 0 - byte, 0 - int, normal Key bytes. DVec Key format is: Key.DVEC - byte, 0 - byte, chunk# - int, normal Key bytes.The main API is at, set, and isNA:
double at ( long row ); // Returns the value expressed as a double. NaN if missing. long at8 ( long row ); // Returns the value expressed as a long. Throws if missing. boolean isNA( long row ); // True if the value is missing. set( long row, double d ); // Stores a double; NaN will be treated as missing. set( long row, long l ); // Stores a long; throws if l exceeds what fits in a double and any floats are ever set. setNA( long row ); // Sets the value as missing.Note this dangerous scenario: loading a missing value as a double, and setting it as a long:
set(row,(long)at(row)); // Danger!The cast from a Double.NaN to a long produces a zero! This code will replace a missing value with a zero.
Modifier and Type | Class and Description |
---|---|
static class |
Vec.CollectDomain
Collect numeric domain of given vector
|
static class |
Vec.VectorGroup
Class representing the group of vectors.
|
static class |
Vec.Writer
More efficient way to write randomly to a Vec - still slow, but much faster than Vec.set()
Usage:
Vec.Writer vw = vec.open();
vw.set(0, 3.32);
vw.set(1, 4.32);
vw.set(2, 5.32);
vw.close();
|
Modifier and Type | Field and Description |
---|---|
java.lang.String[] |
_domain
Enum/factor/categorical names.
|
long[] |
_espc
Element-start per chunk.
|
Key |
_key
Key mapping a Value which holds this Vec.
|
byte |
_time
Time parse, index into Utils.TIME_PARSE, or -1 for not-a-time
|
static int |
KEY_PREFIX_LEN |
static int |
LOG_CHK
Log-2 of Chunk size.
|
static int |
MAX_ENUM_SIZE
Maximal size of enum domain
|
Modifier | Constructor and Description |
---|---|
|
Vec(Key key,
long[] espc)
Main default constructor; requires the caller understand Chunk layout
already, along with count of missing elements.
|
|
Vec(Key key,
long[] espc,
java.lang.String[] domain) |
protected |
Vec(Key key,
Vec v) |
Modifier and Type | Method and Description |
---|---|
Vec |
align(Vec vec)
Always makes a copy of the given vector which shares the same
group.
|
double |
at(long i)
Fetch element the slow way, as a double.
|
long |
at8(long i)
Fetch element the slow way, as a long.
|
long |
byteSize()
Size of compressed vector data.
|
int |
cardinality()
Returns cardinality for enum domain or -1 for other types.
|
protected boolean |
checkMissing(int cidx,
Value val) |
long |
chunk2StartElem(int cidx)
Convert a chunk-index into a starting row #.
|
Chunk |
chunkForChunkIdx(int cidx)
The Chunk for a chunk#.
|
Chunk |
chunkForRow(long i)
The Chunk for a row#.
|
Value |
chunkIdx(int cidx)
Get a Chunk's Value by index.
|
Key |
chunkKey(int cidx)
Get a Chunk Key from a chunk-index.
|
int |
chunkLen(int cidx)
Number of rows in chunk.
|
java.lang.String[] |
domain()
Return an array of domains.
|
java.lang.String |
domain(long i)
Map the integer value for a enum/factor/categorical to it's String.
|
boolean |
equals(java.lang.Object o) |
static Key |
getVecKey(Key key)
Get a Vec Key from Chunk Key, without loading the Chunk
|
Vec.VectorGroup |
group()
Get the group this vector belongs to.
|
Key |
groupKey()
Make a Vector-group key.
|
byte[] |
hash() |
int |
hashCode() |
boolean |
isBad()
Is the column bad.
|
boolean |
isByteVec() |
boolean |
isConst()
Is the column constant.
|
boolean |
isEnum()
Is the column a factor/categorical/enum? Note: all "isEnum()" columns
are are also "isInt()" but not vice-versa.
|
boolean |
isFloat()
Is the column contains float values.
|
boolean |
isInt()
Is all integers?
|
boolean |
isNA(long row)
Fetch the missing-status the slow way.
|
boolean |
isTime()
Whether or not this column parsed as a time, and if so what pattern was used.
|
long |
length()
Number of elements in the vector.
|
static Vec |
make1Elem(double d)
Create a new 1-element vector in the shared vector group for 1-element vectors.
|
static Vec |
make1Elem(Key key,
double d)
Create a new 1-element vector representing a scalar value.
|
Vec |
makeCon(double d) |
Vec |
makeCon(long l)
Make a new vector with the same size and data layout as the old one, and
initialized to a constant.
|
Vec |
makeCon(long l,
java.lang.String[] domain) |
Vec[] |
makeCons(int n,
long l,
java.lang.String[][] domain) |
static Vec |
makeConSeq(double x,
int len) |
static Vec[] |
makeNewCons(long rows,
int cols,
long val,
java.lang.String[][] domain)
Create an array of Vecs from scratch
|
static Vec |
makeSeq(int len) |
Vec |
makeTransf(int[][] map,
java.lang.String[] finalDomain)
Create a vector transforming values according given domain map.
|
Vec |
makeZero()
Make a new vector with the same size and data layout as the old one, and
initialized to zero.
|
Vec |
makeZero(java.lang.String[] domain) |
Vec[] |
makeZeros(int n) |
Vec[] |
makeZeros(int n,
java.lang.String[][] domain) |
Vec |
masterVec()
This Vec does not have dependent hidden Vec it uses.
|
double |
max()
Return column max - lazily computed as needed.
|
double |
mean()
Return column mean - lazily computed as needed.
|
double |
min()
Return column min - lazily computed as needed.
|
long |
naCnt()
Return column missing-element-count - lazily computed as needed.
|
int |
nChunks()
Number of chunks.
|
static Key |
newKey()
Make a new random Key that fits the requirements for a Vec key.
|
Vec.Writer |
open() |
void |
postWrite()
Stop writing into this Vec.
|
protected boolean |
readable()
Default read/write behavior for Vecs.
|
void |
remove(Futures fs) |
Vec |
rollupStats()
Compute the roll-up stats as-needed, and copy into the Vec object
|
Vec |
rollupStats(Futures fs) |
double |
set(long i,
double d)
Write element the VERY slow way, as a double.
|
float |
set(long i,
float f)
Write element the VERY slow way, as a float.
|
long |
set(long i,
long l)
Write element the VERY slow way, as a long.
|
boolean |
setNA(long i)
Set the element as missing the VERY slow way.
|
double |
sigma()
Return column standard deviation - lazily computed as needed.
|
int |
timeMode() |
java.lang.String |
timeParse() |
Vec |
toEnum()
Transform this vector to enum.
|
java.lang.String |
toString()
Pretty print the Vec: [#elems, min/mean/max]{chunks,...}
|
protected boolean |
writable()
Default read/write behavior for Vecs.
|
clone, frozenType, init, newInstance, read, toDocField, write, writeJSON, writeJSONFields
public static final int LOG_CHK
public final Key _key
public final long[] _espc
public java.lang.String[] _domain
public byte _time
public static final int MAX_ENUM_SIZE
public static final int KEY_PREFIX_LEN
public Vec(Key key, long[] espc)
public Vec(Key key, long[] espc, java.lang.String[] domain)
public Vec[] makeZeros(int n)
public Vec[] makeZeros(int n, java.lang.String[][] domain)
public Vec[] makeCons(int n, long l, java.lang.String[][] domain)
public static Vec[] makeNewCons(long rows, int cols, long val, java.lang.String[][] domain)
rows
- Length of each veccols
- Number of vecsval
- Constant value (long)domain
- Factor levels (for factor columns)public Vec makeZero()
public Vec makeZero(java.lang.String[] domain)
public Vec makeCon(long l)
public Vec makeCon(long l, java.lang.String[] domain)
public Vec makeCon(double d)
public static Vec makeSeq(int len)
public static Vec makeConSeq(double x, int len)
public static Vec make1Elem(double d)
public static Vec make1Elem(Key key, double d)
public Vec makeTransf(int[][] map, java.lang.String[] finalDomain)
makeTransf(int[], int[], String[])
public Vec masterVec()
null
public long length()
public int nChunks()
public final boolean isTime()
public final int timeMode()
public final java.lang.String timeParse()
public java.lang.String domain(long i)
public java.lang.String[] domain()
public int cardinality()
public Vec toEnum()
TransfVec
which provides a mapping between values.protected boolean readable()
protected boolean writable()
public double min()
public double max()
public double mean()
public double sigma()
public long naCnt()
public boolean isInt()
public long byteSize()
public byte[] hash()
public final boolean isEnum()
public final boolean isConst()
Returns true if the column contains only constant values and it is not full of NAs.
public final boolean isBad()
Returns true if the column is full of NAs.
public final boolean isFloat()
public final boolean isByteVec()
public Vec rollupStats()
public void postWrite()
public long chunk2StartElem(int cidx)
public int chunkLen(int cidx)
public static Key getVecKey(Key key)
public Key chunkKey(int cidx)
public Value chunkIdx(int cidx)
DKV.get()
. Warning: this pulls the data locally;
using this call on every Chunk index on the same node will
probably trigger an OOM!protected boolean checkMissing(int cidx, Value val)
public static Key newKey()
public Key groupKey()
public final Vec.VectorGroup group()
public Chunk chunkForChunkIdx(int cidx)
public final Chunk chunkForRow(long i)
public final long at8(long i)
public final double at(long i)
public final boolean isNA(long row)
public final long set(long i, long l)
public final double set(long i, double d)
public final float set(long i, float f)
public final boolean setNA(long i)
public final Vec.Writer open()
public java.lang.String toString()
toString
in class java.lang.Object
public void remove(Futures fs)
public boolean equals(java.lang.Object o)
equals
in class java.lang.Object
public int hashCode()
hashCode
in class java.lang.Object
public Vec align(Vec vec)
vec
- vector which is intended to be copiedVec.VectorGroup
with this vector