public class VecUtils
extends java.lang.Object
Modifier and Type | Class and Description |
---|---|
static class |
VecUtils.CollectDomainFast
|
static class |
VecUtils.CollectDomainWeights |
static class |
VecUtils.CollectDoubleDomain |
static class |
VecUtils.CollectIntegerDomain
|
static class |
VecUtils.DomainDedupe
|
static class |
VecUtils.DotProduct
DotProduct of two Vecs of the same length
|
static class |
VecUtils.MeanResponsePerLevelTask
Compute the mean (weighted) response per categorical level
Skip NA values (those are already a separate bucket in the tree building histograms, for which this is designed)
|
static class |
VecUtils.MinMaxTask |
static class |
VecUtils.ReorderTask
Reorder an integer (such as Enum storage) Vec using an int -> int mapping
|
static class |
VecUtils.SequenceProduct |
static class |
VecUtils.ShuffleVecTask
Randomly shuffle a Vec using Fisher Yates shuffle
https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle
|
Constructor and Description |
---|
VecUtils() |
Modifier and Type | Method and Description |
---|---|
static Vec |
categoricalDomainsToNumeric(Vec src)
|
static Vec |
categoricalToInt(Vec src)
|
static Vec |
categoricalToStringVec(Vec src)
|
static java.lang.String[] |
collectDomainFast(Vec vec)
Collects current domain of a categorical vector in an optimized way.
|
static double[] |
collectDomainWeights(Vec vec,
Vec weights)
Collect the frequencies of each level in a categorical Vec.
|
static void |
deleteVecs(Vec[] vs,
int cnt) |
static VecUtils.MinMaxTask |
findMinMax(Vec numVec,
Vec weightVec) |
static int[] |
getLocalChunkIds(Vec v) |
static Vec |
numericToCategorical(Vec src)
|
static Vec |
numericToStringVec(Vec src)
|
static Vec |
remapDomain(java.lang.String[] newDomainValues,
Vec originalVec)
Remaps vec's current domain levels to a new set of values.
|
static Vec |
shuffleVec(Vec origVec,
long seed)
Randomly shuffle a Vec.
|
static Vec |
stringToCategorical(Vec vec)
|
static Vec |
stringToNumeric(Vec src)
|
static Vec |
toCategoricalVec(Vec src)
|
static Vec |
toNumericVec(Vec src)
|
static Vec |
toStringVec(Vec src)
|
static Vec |
UUIDToStringVec(Vec src)
|
public static Vec toCategoricalVec(Vec src)
Vec
of categorical values from an existing Vec
.
This method accepts all Vec
types as input. The original Vec is not mutated.
If src is a categorical Vec
, a copy is returned.
If src is a numeric Vec
, the values are converted to strings used as domain
values.
For all other types, an exception is currently thrown. These need to be replaced
with appropriate conversions.
Throws H2OIllegalArgumentException() if the resulting domain exceeds
Categorical.MAX_CATEGORICAL_COUNT.public static Vec toNumericVec(Vec src)
Vec
of numeric values from an existing Vec
.
This method accepts all Vec
types as input. The original Vec is not mutated.
If src is a categorical Vec
, a copy is returned.
If src is a string Vec
, all values that can be are parsed into reals or integers, and all
others become NA. See stringToNumeric for parsing details.
If src is a numeric Vec
, a copy is made.
If src is a time Vec
, the milliseconds since the epoch are used to populate the new Vec.
If src is a UUID Vec
, the existing numeric storage is used to populate the new Vec.
Throws H2OIllegalArgumentException() if the resulting domain exceeds
Categorical.MAX_CATEGORICAL_COUNT.public static Vec categoricalToInt(Vec src)
Vec
of numeric values from a categorical Vec
.
If the first value in the domain of the src Vec is a stringified ints,
then it will use those ints. Otherwise, it will use the raw enumeration level mapping.
If the domain is stringified ints, then all of the domain must be able to be parsed as
an int. If it cannot be parsed as such, a NumberFormatException will be caught and
rethrown as an H2OIllegalArgumentException that declares the illegal domain value.
Otherwise, the this pointer is copied to a new Vec whose domain is null.
The magic of this method should be eliminated. It should just use enumeration level
maps. If the user wants domains to be used, call categoricalDomainsToNumeric().
PUBDEV-2209public static Vec toStringVec(Vec src)
Vec
of string values from an existing Vec
.
This method accepts all Vec
types as input. The original Vec is not mutated.
If src is a string Vec
, a copy of the Vec
is made.
If src is a categorical Vec
, levels are dropped, and the Vec
only records the string.
For all numeric Vec
s, the number is converted to a string.
For all UUID Vec
s, the hex representation is stored as a string.public static java.lang.String[] collectDomainFast(Vec vec) throws java.lang.IllegalArgumentException
vec
- A categorical vector to collect domain of.java.lang.IllegalArgumentException
- If the given vector is not categoricalpublic static double[] collectDomainWeights(Vec vec, Vec weights)
vec
- categorical Vecweights
- optional weight Vecpublic static void deleteVecs(Vec[] vs, int cnt)
public static int[] getLocalChunkIds(Vec v)
public static Vec remapDomain(java.lang.String[] newDomainValues, Vec originalVec) throws java.lang.UnsupportedOperationException, java.lang.IllegalArgumentException
Changes are made to this very vector, no copying is done. If you need the original vector to remain unmodified, please make sure to copy it first.
newDomainValues
- An array of new domain values. For each old domain value, there must be a new value in
this array. The value at each index of newDomainValues array represents the new mapping for
this very index. May not be null.originalVec
- Vector with values corresponding to the original domain to be remapped. Remains unmodified.Vec
with exactly the same length as the original vector supplied.
Its domain values are re-mapped.java.lang.UnsupportedOperationException
- When invoked on non-categorical vectorjava.lang.IllegalArgumentException
- Length of newDomainValues must be equal to length of current domain values of
this vectorpublic static Vec shuffleVec(Vec origVec, long seed)
origVec
- original Vecseed
- seed for random generatorpublic static VecUtils.MinMaxTask findMinMax(Vec numVec, Vec weightVec)