Change Log¶
v2.0.23 (2018-02-14)¶
Download at: http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.0/23/index.html
- Technical task
- SW-652 - Deliver SW documentation in HTML output
- Bug
- SW-685 - Fix Typo in documentation
- SW-695 - Make printHadoopDistributions gradle task available again for testing
- SW-701 - Kill the client when one of the h2o nodes went OOM in external mode
- SW-706 - Fix pysparkling.ml import for non-interactive sessions
- SW-707 - parquet import fails on HDP with Spark 2.0 (azure hdi cluster)
- SW-708 - Make sure H2OMojoModel does not required H2OContext to be initialized
- SW-709 - Fix mojo predictions tests
- SW-710 - In PySparkling pipelines, ensure that if users pass integer to double type we handle that correctly for all possible double values
- SW-713 - Write a simple test for parquet import in Sparkling Water
- SW-714 - Add option to H2OModel pipeline step allowing us to convert unknown categoricals to NAs
- SW-715 - Fix driverif configuration on the external backend
- Improvement
- SW-606 - Verify & Document run of RSparkling on top of Databricks Azure cluster
- SW-678 - Document how to change log location
- SW-683 - H2OContext can't be initalized on Databricks cloud
- SW-686 - Fix typo in documentation
- SW-687 - Upgrade Gradle to 4.5
- SW-688 - Update docs - SparklyR supports Spark 2.2.1 in the latest release
- SW-690 - Log Sparkling Water version during startup of Sparkling Water
- SW-693 - Allow to set driverIf on external H2O backend
- SW-694 - Fix creation of Extended JAR in gradle task
- SW-700 - Report Yarn App ID of spark application in H2OContext
- SW-703 - Upload generated sphinx documentation to S3
- SW-704 - Update links on the download page to point to the new docs
- SW-705 - Increase memory for JUNIT tests
- SW-718 - Upgrade to Gradle 4.5.1
- SW-719 - Upgrade to H2O 3.18.0.1
- SW-720 - Fix parquet import test on external backend
- Docs
v2.0.22 (2018-01-18)¶
Download at: http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.0/22/index.html
- Bug
- SW-551 - Remove hotfix introduced by [SW-541] and implement proper fix
- SW-661 - Use always correct Spark version on the R download page
- SW-662 - Remove extra files that got into the repo
- SW-666 - Kill the cluster when a new executors joins in the internal backend
- SW-668 - Generate download link as part of the release notes
- SW-669 - Remove mentions of local-cluster in public docs
- SW-670 - Deprecated call in H2OContextInitDemo
- SW-671 - Fix jenkinsfile for builds again specific h2o branches
- Improvement
v2.0.21 (2018-01-03)¶
- Bug
- SW-627 - [PySparkling] calling as_spark_frame for the second time results in exception
- SW-630 - Fix ham or spam flow to reflect latest changes in pipelines
- SW-631 - Ensure that we do not access RDDs in pipelines ( to unblock the deployment)
- SW-646 - Fix incosistencies in ham or spam examples between scala and python
- SW-648 - Fix ham or spam pipeline tests
- SW-649 - Fix ham or spam tests for deeplearning pipeline
- Improvement
- SW-608 - Measure time of conversions to H2OFrame in debug mode
- SW-612 - Port all arguments available to Scala ML to PySparkling ML
- SW-617 - Support for exporting mojo to hdfs
- SW-632 - Dump full spark configuration during H2OContext.getOrCreate into DEBUG
- SW-635 - Fix wrong instruction at PySparkling download page
- SW-637 - Create new DataFrame with new schema when it actually contain any dot in names
- SW-638 - Port release script into the sw repo
- SW-639 - Use persist layer for exportPOJOModel
- SW-640 - export H2OMOJOMOdel.createFromMOJO to pysparkling
- SW-642 - Create test for mojo predictions in PySparkling
- SW-643 - Add tests for H2ODeeplearning in Scala and Python and Fix potential problems
- SW-644 - Log spark configuration to INFO level
- SW-650 - Upgrade Gradle to 4.4.1
- SW-656 - Upgrade ShadowJar to 2.0.2
v2.0.20 (2017-12-11)¶
- Bug
- SW-615 - pysparkling.__version__ returns incorrectly ‘SUBST_PROJECT_VERSION’
- SW-616 - PySparkling fails on python 3.6 because long time does not exist in python 3.6
- SW-621 - PySParkling failing on Python3.6
- SW-624 - Python build does not support H2O_PYTHON_WHEEL when building against h2o older then 3.16.0.1
- SW-628 - PySparkling fails when installed from pypi
- Improvement
- SW-626 - Upgrade Gradle to 4.4
v2.0.19 (2017-12-01)¶
v2.0.18 (2017-11-25)¶
- Bug
- SW-320 - H2OConfTest Python test blocks test run
- SW-499 - BinaryType handling is not implemented in SparkDataFrameConverter
- SW-535 - asH2OFrame gives error if column names have DOT in it
- SW-547 - Don’t use md5skip in external mode
- SW-569 - pysparkling: h2o on exit does not shut down cleanly
- SW-572 - Additional fix for [SW-571]
- SW-573 - Minor Gradle build improvements and fixes
- SW-575 - Incorrect comment in hamOrSpamMojo pipeline
- SW-576 - Cleanup pysparkling test infrastructure
- SW-577 - Fix conditions in jenkins file
- SW-580 - Fix composite build in Jenkins
- SW-581 - Fix H2OConf test on external cluster
- SW-582 - Opening Chicago Crime Demo Notebook errors on the first opening
- SW-584 - Create extended directory automatically
- SW-588 - Fix links in README
- SW-589 - Wrap stages in try finally in jenkins file
- SW-592 - Properly pass all parameters to algorithm
- SW-593 - H2Conf cannot be initialized on windows
- SW-594 - Gradle ml submodule reports success even though tests fail
- SW-595 - Fix ML tests
- New Feature
- SW-519 - Introduce SW Models into Spark python pipelines
- Task
- SW-609 - Upgrade H2O dependency to 3.16.0.1
- Improvement
- SW-318 - Keep H2O version inside sparklin-water-core.jar and provide utility to query it
- SW-420 - Shell scripts miss-leading error message
- SW-504 - Provides Sparkling Water Spark Uber package which can be used in –packages
- SW-570 - Stop previous jobs in jenkins in case of PR
- SW-571 - In PySparkling, getOrCreate(spark) still incorrectly complains that we should use spark session
- SW-583 - Upgrade to Gradle 4.3
- SW-585 - Add the custom commit status for internal and external pipelines
- SW-586 - [ML] Remove some duplicities, enable mojo for deep learning
- SW-590 - Replace deprecated method call in ChicagoCrime python example
- SW-591 - Repl doesn’t require H2O dependencies to compile
- SW-596 - Minor build improvements
- SW-603 - Upgrade Gradle to 4.3.1
- SW-605 - addFiles doesn’t accept sparkSession
- SW-610 - Change default client mode to INFO, let user to change it at runtime
v2.0.17 (2017-10-23)¶
- Bug
- SW-555 - Fix documentation issue in PySparkling
- SW-558 - Increase default value for client connection retry timeout in
- SW-560 - SW documentation for nthreads is inconsistent with code
- SW-561 - Fix reporting artefacts in Jenkins and remove use of h2o-3-shared-lib
- SW-564 - Clean test workspace in jenkins
- SW-565 - Fix creation of extended jar in jenkins
- SW-567 - Fix failing tests on external backend
- SW-568 - Remove obsolete and failing idea configuration
- SW-559 - GLM fails to build model when weights are specified
- Improvement
- SW-557 - Create 2 jenkins files ( for internal and external backend ) backed by configurable pipeline
- SW-562 - Disable web on external H2O nodes in external cluster mode
- SW-563 - In external cluster mode, print also YARN job ID of the external cluster once context is available
- SW-566 - Upgrade H2O to 3.14.0.7
- SW-553 - Improve handling of sparse vectors in internal cluster
v2.0.16 (2017-10-10)¶
- Bug
- SW-423 - Tests of External Cluster mode fails
- SW-516 - External cluster improperly convert RDD[ml.linalg.Vector]
- SW-525 - Don’t use GPU nodes for sparkling water testing in Jenkins
- SW-526 - Add missing when clause to scripts test stage in Jenkinsfile
- SW-527 - Use dX cluster for Jenkins testing
- SW-529 - Code defect in Scala example
- SW-531 - Use code which is compatible between Scala 2.10 and 2.11
- SW-532 - Make auto mode in external cluster default for tests in jenkins
- SW-534 - Ensure that all tests run on both, internal and external backends
- SW-536 - Allow to test sparkling water against specific h2o branch
- SW-537 - Update Gradle to 4.2RC2
- SW-538 - Fix problem in Jenkinsfile where H2O_HOME has higher priority then H2O_PYTHON_WHEEL
- SW-539 - Fix PySparkling issue when running multiple times on the same node
- SW-541 - Model training hangs in SW
- SW-542 - sw does not support parquet import
- SW-552 - Fix documentation bug
- New Feature
- Improvement
v2.0.14 (2017-08-02)¶
- Bug
- Improvement
- SW-355 - Include H2O R client distribution in Sparkling Water binary
- SW-506 - Documentation for the backends should mention get-extended-h2o.sh instead of manual jar extending
- SW-507 - Upgrade to Gradle 4.0.2
- SW-508 - More robust get-extended-h2o.sh
- SW-509 - Add back DEVEL.md and CHANGELOG.md and redirect to new versions
v2.0.13 (2017-07-17)¶
- Improvement
- SW-490 - Upgrade Gradle to 4.0.1
- SW-491 - Increase default value for Write and Read confirmation timeout
- SW-492 - Remove dead code and deprecation warning in tests
- SW-493 - Enforce Scala Style rules
- SW-494 - Remove hard dependency on RequestServer by using RestApiContext
- SW-496 - Remove ignored empty “H2OFrame[Time] to DataFrame[TimeStamp]” test
- SW-498 - Upgrade H2O to 3.10.5.4
v2.0.12 (2017-07-12)¶
v2.0.11 (2017-06-29)¶
- Bug
- SW-469 - Remove accidentally added kerb.conf file
- SW-470 - Allow to pask sparkSession to Security.enableSSL and deprecate sparkContext
- SW-474 - Use deprecated HTTPClient as some CDH versions does not have the new method
- SW-475 - Handle duke library in case it’s loaded using –packages
- SW-479 - Fix CHANGELOG location in make-dist.sh
- Improvement
- SW-457 - Clean up windows scripts
- SW-466 - Separate Devel.md into multiple rst files
- SW-472 - Convert to rst README in gradle dir
- SW-473 - Upgrade to gradle 4.0
- SW-477 - Upgrade H2O to 3.10.5.2
- SW-480 - Bring back publishToMavenLocal task
- SW-482 - Updates to change log location
- SW-484 - Make rel-2.0 changelog consistent and also rst
v2.0.10 (2017-06-15)¶
- Technical task
- SW-211 - In PySparkling for spark 2.0 document how to build the package
- Bug
- SW-448 - Add missing jar into the assembly
- SW-450 - Fix instructions on the download site
- SW-453 - Use size method to get attr num
- SW-454 - Replace sparkSession with spark in backends documentation
- SW-456 - Make shell scripts safe
- SW-459 - Update PySparkling run-time dependencies
- SW-461 - Fix wrong getters and setters in pysparkling
- SW-467 - Fix typo in the FAQ documentation
- SW-468 - Fix make-dist
- New Feature
- SW-455 - Replace the remaining references to egg files
- Improvement
- SW-24 - Append tab on Sparkling Water download page - how to use Sparkling Water package
- SW-111 - Update FAQ with information about hive metastore location
- SW-112 - Sparkling Water Tunning doc: add heartbeat dcoumentation
- SW-311 - Please report Application Type to Yarn Resource Manager
- SW-340 - Improve structure of SW README
- SW-426 - Allow to download sparkling water logs from the spark UI
- SW-444 - Remove references to Spark 1.5, 1.4 ( as it’s old ) in README.rst and other docs
- SW-447 - Upgrade H2O to 3.10.5.1
- SW-452 - Add missing spaces after “,” in H2OContextImplicits
- SW-460 - Allow to configure flow dir location in SW
- SW-463 - Extract sparkling water configuration to extra doc in rst format
- SW-465 - Mark tensorflow demo as experimental
v2.0.9 (2017-05-25)¶
- Bug
- SW-263 - Cannot run build in parallel because of Python module
- SW-336 - Wrong documentation of PyPi h2o_pysparkling_2.0 package
- SW-421 - External cluster: Job is reporting exit status as FAILED even all mappers return 0
- SW-429 - Different cluster name between client and h2o nodes in case of external cluster
- SW-430 - pysparkling: adding a column to a data frame does not work when parse the original frame in spark
- SW-431 - Allow to pass additional arguments to run-python-script.sh
- SW-436 - Fix getting of sparkling water jar in pysparkling
- SW-437 - Don’t call atexit in case of pysparkling in cluster deploy mode
- SW-438 - store h2o logs int unique directories
- SW-439 - handle interrupted exception in H2ORuntimeInfoUIThread
- SW-335 - Cannot install pysparkling from PyPi
- Improvement
- SW-445 - Remove information from README.pst that pip cannot be used
- SW-341 - Support Python 3 distribution
- SW-380 - Define Jenkins pipeline via Jenkinsfile
- SW-422 - Upgrade H2O dependency to 3.10.4.6
- SW-424 - Add SW tab in Spark History Server
- SW-427 - Upgrade H2O dependency to 3.10.4.7
- SW-433 - Add change logs link to the sw download page
- SW-435 - Upgrade shadow jar plugin to 2.0.0
- SW-440 - Sparkling Water cluster name should contain spark app id instead of random number
- SW-441 - Replace deprecated DefaultHTTPClient in AnnouncementService
- SW-442 - Get array size from metadata in case of ml.lilang.VectorUDT
- SW-443 - Upgrade H2O version to 3.10.4.8
v2.0.8 (2017-04-07)¶
- Bug
- SW-365 - Proper exit status handling of external cluster
- SW-398 - Use timeout for read/write confirmation in external cluster mode
- SW-400 - Fix stopping of H2OContext in case of running standalone application
- SW-401 - Add configuration property to external backend allowing to specify the maximal timeout the cloud will wait for watchdog client to connect
- SW-405 - Use correct quote in backend documentation
- SW-408 - Use kwargs for h2o.connect in pysparkling
- SW-409 - Fix stopping of python tests
- SW-410 - Honor –core Spark settings in H2O executors
- SW-419 - Fixlf4JLoggerFactory creating on Spark 2.0
- Improvement
- SW-231 - Sparkling Water download page is missing PySParkling/RSparkling info
- SW-404 - Upgrade H2O dependency to 3.10.4.4
- SW-406 - Download page should list available jars for external cluster.
- SW-411 - Migrate Pysparkling tests and examples to SparkSession
- SW-412 - Upgrade H2O dependency to 3.10.4.5
2.0.7 (2017-04-07)¶
- Bug
- SW-334 - as_factor() ‘corrupts’ dataframe if it fails
- SW-353 - Kerberos for SW not loading JAAS module
- SW-364 - Repl session not set on scala 2.11
- SW-368 - bin/pysparkling.cmd is missing
- SW-371 - Fix MarkDown syntax
- SW-372 - Run negative test for PUBDEV-3808 multiple times to observe failure
- SW-375 - Documentation fix in external cluster manual
- SW-376 - Tests for DecimalType and DataType fail on external backend
- SW-377 - Implement stopping of external H2O cluster in external backend mode
- SW-383 - Update PySparkling README with info about SW-335 and using SW from Pypi
- SW-385 - Fix residual plot R code generator
- SW-386 - SW REPL cannot be used in combination with Spark Dataset
- SW-387 - Fix typo in setClientIp method
- SW-388 - Stop h2o when running inside standalone pysparkling job
- SW-389 - Extending h2o jar from SW doesn’t work when the jar is already downloaded
- SW-392 - Python in gradle is using wrong python - it doesn’t respect the PATH variable
- SW-393 - Allow to specify timeout for h2o cloud up in external backend mode
- SW-394 - Allow to specify log level to external h2o cluster
- SW-396 - Create setter in pysparkling conf for h2o client log level
- SW-397 - Better error message covering the most often case when cluster info file doesn’t exist
- Improvement
2.0.6 (2017-03-21)¶
- Bug
- SW-306 - KubasCluster: Notify file fails on failure
- SW-308 - Intermittent failure in creating H2O cloud
- SW-321 - composite function fail when inner cbind()
- SW-331 - Security.enableSSL does not work
- SW-347 - Cannot start Sparkling Water at HDP Yarn cluster
- SW-349 - Sparkling Shell scripts for Windows do not work
- SW-350 - Fix command line environment for Windows
- SW-357 - PySparkling in Zeppelin environment using wrong class loader
- SW-361 - Flow is not available in Sparkling Water
- SW-362 - PySparkling does not work
- Improvement
- SW-333 - ApplicationMaster info in Yarn for external cluster
- SW-337 - Use
h2o.connect
in PySpark to connect to H2O cluster - SW-338 - h2o.init in PySpark prints internal IP. We should remove it or replace it with actual IP of driver node (based on spark_DNS settings)
- SW-344 - Use Spark public DNS if available to report Flow UI
- SW-345 - Create configuration manual for External cluster
- SW-356 - Fix documentation for spark.ext.h2o.fail.on.unsupported.spark.param
- SW-359 - Upgrade H2O dependency to 3.10.4.1
- SW-360 - Upgrade H2O dependency to 3.10.4.2
- SW-363 - Use Spark public DNS if available to report Flow UI
2.0.5 (2017-02-10)¶
2.0.4 (2017-01-02)¶
- Bug
- Improvement
2.0.3 (2017-01-04)¶
- Bug
- SW-152 - ClassNotFound with spark-submit
- SW-266 - H2OContext shouldn’t be Serializable
- SW-276 - ClassLoading issue when running code using SparkSubmit
- SW-281 - Update sparkling water tests so they use correct frame locking
- SW-283 - Set spark.sql.warehouse.dir explicitly in tests because of SPARK-17810
- SW-284 - Fix CraigsListJobTitlesApp to use local file instead of trying to get one from hdfs
- SW-285 - Disable timeline service also in python integration tests
- SW-286 - Add missing test in pysparkling for conversion RDD[Double] -> H2OFrame
- SW-287 - Fix bug in SparkDataFrame converter where key wasn’t random if not specified
- SW-288 - Improve performance of Dataset tests and call super.afterAll
- SW-289 - Fix PySparkling numeric handling during conversions
- SW-290 - Fixes and improvements of task used to extended h2o jars by sparkling-water classes
- SW-292 - Fix ScalaCodeHandlerTestSuite
- New Feature
- SW-178 - Allow external h2o cluster to act as h2o backend in Sparkling Water
- Improvement
2.0.2 (2016-12-09)¶
2.0.1 (2016-12-04)¶
- Bug
- SW-196 - Fix wrong output of str on H2OContext
- SW-212 - Fix depreciation warning regarding the compiler in scala.gradle
- SW-221 - SVM: the model is not unlocked after building
- SW-226 - SVM: binomial model - AUC curves are missing
- SW-227 - java.lang.ClassCastException: com.sun.proxy.$Proxy19 cannot be cast to water.api.API
- SW-242 - Fix Python build process
- SW-248 - Fix TensorFlow notebook to support Python 3
- SW-264 - PySparkling is not using existing SQLContext
- SW-268 - Databricks cloud: Jetty class loading problem.
- New Feature
- SW-267 - Add assembly-h2o module which will extend h2o/h2odriver jar by additional classes
- Improvement
- SW-129 - Add support for transformation from H2OFrame -> RDD in PySparkling
- SW-169 - Remove deprecated calls
- SW-193 - Append scala version to pysparkling package name
- SW-200 - Add flows from presentation in Budapest and Paris to flows dir
- SW-208 - Generate all PySparkling artefacts into build directory
- SW-209 - RSparkling: improve handling of Sparkling Water package ependencies
- SW-215 - Improve internal type handling
- SW-219 - RSparkling: as_h2o_frame should properly name the frame
- SW-230 - Fix sparkling-shell windows script
- SW-235 - Discover py4j package version automatically from SPARK_HOME
- SW-243 - Remove all references to local-cluster[…] in our doc
- SW-245 - Upgrade of H2O dependency to the latest turing release (3.10.0.10)
2.0.0 (2016-09-26)¶
- Bugs
- SW-57 - Produce artifacts for Scala 2.11
- SW-71 - Expose method
H2OContext#setLogLevel
to setup log level of H2O - SW-128 - Publish flows pack in GitHub repo and embed them in distributed JAR
- SW-168 - Explore slow-down for fat-dataset with many categorical columns
- SW-172 -
NodeDesc
should be interned or useH2OKey
instead ofNodeDesc
- SW-176 - H2O context is failing on CDH-5.7.1 with Spark Version 1.6.0-CDH.5.7.1
- SW-185 - Methods on frame can’t be called in compute method on external cluster
- SW-186 - Hide checks whether incoming data is NA into convertorCtx
- SW-191 - Better exception message in case dataframe with the desired key already exist when saving using datasource api
- SW-192 - Add
org.apache.spark.sql._
to packages imported by default in REPL - SW-197 - Fix all mentions of
H2OContext(sc)
toH2OContext.getOrCreate(sc)
in PySparkling - SW-201 - Methods in water.support classes should use
[T <: Frame]
instead ofH2OFrame
- SW-202 - Pipeline scripts are not tested!
- SW-205 - PySparkling tests launcher does not report error correctly
- SW-210 - Change log level of arguments used to start client to Info
- New Features
- Improvements
- SW-158 - Support Spark DataSet in the same way as RDD and DataFrame
- SW-163 - Upgrade H2O dependency to the latest Turing release
- SW-164 - Replace usage of
SQLContext
bySparkSession
- SW-165 - Change default schema for Scala code to black one.
- SW-170 - Unify H2OFrame datasource and asDataFrame API
- SW-171 - Internal API refactoring to allow multiple backends
- SW-174 - Remove unused fields from H2ORDD
- SW-177 - Refactor and simplify REPL
- SW-204 - Distribute tests log4j logs to corresponding build directories
- Breaking API changes
- The enum
hex.Distribution.Family
is nowhex.genmodel.utils.DistributionFamily
- The deprecated methods (e.g.,
H2OContext#asSchemaRDD
) were removed
- The enum
v1.6.x (2016-03-15)¶
- Sparkling Water 1.6 brings support of Spark 1.6.
- For detailed changelog, please read rel-1.6/CHANGELOG.
v1.5.x (2015-09-28)¶
- Sparkling Water 1.5 brings support of Spark 1.5.
- For detailed changelog, please read rel-1.5/CHANGELOG.
v1.4.x (2015-07-06)¶
- Sparkling Water 1.4 brings support of Spark 1.4.
- For detailed changelog, please read rel-1.4/CHANGELOG.
v1.3.x (2015-05-25)¶
- Sparkling Water 1.3 brings support of Spark 1.3.
- For detailed changelog, please read rel-1.3/CHANGELOG.
v1.2.x (2015-05-18) and older¶
- Sparkling Water 1.2 brings support of Spark 1.2.
- For detailed changelog, please read rel-1.2/CHANGELOG.