Scala for H2O: Shalala¶
Overview¶
Shalala is a Scala library providing access to H2O API via a dedicated DSL and also a REPL integrated into H2O.
Currently the library supports following expressions abstracting H2O API.
R-like commands
help
ncol <frame>
nrow <frame>
head <frame>
tail <frame>
f(2)           - returns 2. column
f("year")      - returns column "year"
f(*,2)         - returns 2. column
f(*, 2 to 5)   - returns 2., 3., 4., 5. columns
f(*,2)+2       - scalar operation - 2.column + 2
f(2)*3         - scalar operation - 2.column * 3
f-1            - scalar operation - all columns - 1
f < 10         - transform the frame into boolean frame respecting the condition
H2O commands
keys              - shows all available keys i KV store
parse("iris.csv") - parse given file and return a frame
put("a.hex", f)   - put a frame into KV store
get("b.hex")      - return a frame from KV store
jobs              - shows a list of executed jobs
shutdown          - shutdown H2O cloud
M/R commands
f map (Add(3))   - call of map function of all columns in frame
                    - function is (Double=>Double) and has to extend Iced
f map (Less(10)) - call of map function on all columns
                    - function is (Double=>Boolean)
Build Scalala¶
To build Shalala sbt is required. You can get sbt from http://www.scala-sbt.org/release/docs/Getting-Started/Setup.
To compile Shalala please type:
sbt compile
Launch REPL¶
Shalala provides an integrated Scala REPL exposing H2O DSL. You can start REPL via sbt:
sbt run
Key points of implementation¶
- Using primitive types specialization (to allow for generation code using primitive types)
- All objects passed around cloud has to inherits from water.Iced
Examples¶
val f = parse("smalldata/cars.csv")
f(2)           // number of cylinders
f("year")      // year of production
f(*, 0::2::7::Nil)  // year,number of cylinders and year
f(7) map Sub(1000) // Subtract 1000 from year column
f("cylinders") map (new BOp {
    var sum:scala.Double = 0
    def apply(rhs:scala.Double) = { sum += rhs; rhs*rhs / sum; }
  })
FAQs¶
- How to generate Eclipse project and import it into Eclipse? - Launch sbt shell 
- In sbt use the command eclipse to create Eclipse project files - > eclipse 
- In Eclipse use the Import Wizard to import the project into workspace 
 
- How to run REPL from Eclipse? - Import h2o-scala project into Eclipse
- Launch water.api.dsl.ShalalaRepl as a Scala application
 
- How to generate Idea project and import it? - Launch sbt 
- In sbt use the command gen-idea to create Idea project files - > gen-idea 
- In Idea open the project located in h2o-scala directory