Change Log

v3.42.0.1-1 (2023-06-28)

Downloads:

  • Bug

    • #5642 - Upgrade H2O to 3.42.0.1

    • #5643 - Fix release process ignoring Nexus upload errors

  • Improvement

    • #5644 - New issue template

    • #5631 - Extend Spark 2.3 support, remove Spark 2.1-2.2 leftovers

    • #3028 - Upgrade version to 3.40.1.1-1

  • New Feature

    • #5618 - Spark 3.4 support

  • Task

    • #5628 - Adjust to h2o-3 changes : loglikelihood metric, python build env

    • #5622 - Remove Python 2.7 support

v3.40.0.1-1 (2023-02-24)

Downloads:

  • Bug

    • #2887 - Bug in writing CV mojos (loop index not used)

    • #2960 - Integration test suite sometimes fails

  • New Feature

    • #2888 - Add “proxy only” authentication mode

  • Improvement

    • #3027 - Uprade to H2O 3.40.0.1

    • #3026 - Add Support for Python 3.9

    • #2900 - Remove namedMojoOutputColumns from API

    • #2894 - Update Spark in Docker Images to 3.2.3

    • #3021 - Update Spark in Docker Images to 3.3.2

  • Engineering Story

    • #2899 - Deprecate Support for Apache Spark 2.3

    • #2898 - Fix DBC tests

    • #3017 - Upgrade Sparkling Water Snapshot Version to 3.40.0.1-1-SNAPSHOT

v3.38.0.3-1 (2022-11-24)

Downloads:

  • Engineering Story

    • #2892 - Remove Outdated Roadmap

  • Improvement

    • #3025 - Upgrade to H2O 3.38.0.3

    • #3023 - Adding prediction interval option to MOJOs in H2OMOJOSettings when using pysparkling

  • New Feature

    • #3024 - Improvement in overall scoring performance for DAI mojo’s

    • #3019 - Add PAM Authentication

v3.38.0.1-1 (2022-09-22)

Downloads:

  • Improvement

    • #3015 - Remove Deprecated Parameters on H2ODeepLearning

    • #3016 - Upgrade to H2O 3.38.0.1

    • #2980 - Improve ipv6 handling

    • #3722 - Remove code related to H2OClient from SW codebase

  • New Feature

    • #3013 - Add Ability to Calculate Contributions for Transformed Features On H2OMOJOPipelineModel

    • #3001 - Add Extended Isolation Forest to SW API

  • Engineering Story

    • #3012 - Fix GBM MOJO Test in Python

    • #3009 - Remove Support for Spark 2.2

    • #3003 - Update Number of Parameters in GBM MOJO Test

    • #2939 - Upgrade Sparkling Water Snapshot Version to 3.38.0.1-1-SNAPSHOT

  • Bug

    • #3000 - Fix Time Conversion Tests in Python API

v3.36.1.5-1 (2022-09-16)

Downloads:

  • Improvement

    • #3014 - Upgrade to H2O 3.36.1.5

  • Engineering Story

    • #3004 - Use Dedicated Credentials for Accessing S3

    • #2940 - Refactor ChicagoCrimeApp Example

  • Bug

    • #3002 - Frame Metadata Retrieval Downloads Unnecessary Imformation

  • Docs

    • #2975 - Change DRF tutorial to be on par with h2o-3

    • #2976 - Change KMeans tutorial to be on par with h2o-3

    • #2973 - Change DeepLearning tutorial to be on par with h2o-3

v3.36.1.4-1 (2022-08-04)

Downloads:

  • Improvement

    • #2999 - Upgrade to H2O 3.36.1.4

  • Docs

    • #2995 - Dedicated Jenkins Worker Profile for K8s tests

    • #2996 - Remove False Statement from Sparkling Water Documentation

    • #2994 - Improve Tutorial for Working with Binary Models

    • #2991 - Document Saving MOJO Models to Local File System

  • Engineering Story

    • #2993 - Update Spark in Docker Images to 3.2.2

    • #2992 - Migrate Jenkins CI under Account Dedicated to OSS

v3.36.1.3-1 (2022-07-11)

Downloads:

  • Improvement

    • #2989 - Upgrade to H2O 3.36.1.3

    • #2978 - Increase the number of builds kept in Jenkin’s build history

    • #3273 - Rewrite and Improve K8s Tests

  • Engineering Story

    • #2990 - Fix Docker Image Publishing to DockerHub

    • #2983 - Fix Building of RSparkling Docker Images

  • Bug

    • #2988 - Fix Failing Test on External Backend

  • New Feature

    • #2985 - Add Support for Spark 3.3

  • Docs

    • #2986 - Invalid Python Code Examples

v3.36.1.2-1 (2022-05-30)

Downloads:

  • Bug

    • #2984 - Pysparkling with DAI mojo producing same contributions for all rows

    • #2971 - H2OPipelineMOJOModel Reports Deprecation Warning for Every Line of Code

    • #2972 - Code Generation of R and Python Configuration Classes Should Consider Overloaded Methods

  • Improvement

    • #2981 - Upgrade H2O to 3.36.1.2

    • #2982 - Upgrade MOJO runtime to 2.7.8

  • Engineering Story

    • #2979 - Remove PySpark Integration Test

    • #2977 - Fix Failing R Tests

v3.36.1.1-1 (2022-04-20)

Downloads:

  • Improvement

    • #2969 - Upgrade H2O to 3.36.1.1

    • #2948 - Peformance improvement: do constant check & row count in one iteration

  • Docs

    • #2970 - Change GAM tutorial to be on par with h2o-3

    • #2967 - Add GLM tutorial and expose coefficients

    • #2917 - Dockument Ability to Override Mojo Runtime lib in SW

  • Engineering Story

    • #2968 - Upgrade Scala on Builds for Spark 3.0 and 3.1 to 2.12.15

    • #2966 - Add spline_orders to Tests Covering Parameter Propagation to H2OGAMMOJOModel

    • #2963 - Fix Failing AutoML Test

    • #2964 - Update Spark in Docker Images to 3.1.3

  • New Feature

    • #2965 - Add Ability to Specify Number of Cores with Automatic External Backend on K8s

    • #4572 - Add H2O Stacked Ensembles to Algo API

v3.36.0.4-1 (2022-04-01)

Downloads:

  • Bug

    • #2961 - Fix Binary Model Cleaning in H2OAutoML

  • Engineering Story

    • #2962 - Upgrade H2O to 3.36.0.4

    • #2957 - Initialize Conda in Release Pipeline

    • #2956 - Give More Memory to Integration Tests

    • #2955 - Remove Sparkling Water P4J Gateway

  • Improvement

    • #2921 - Introduce a warning during the serialization of MOJO model

v3.36.0.3-1 (2022-02-18)

Downloads:

  • Improvement

    • #2953 - Upgrade to H2O 3.36.0.3

    • #2952 - Deprecate namedMojoOutputColumns flag

    • #2950 - Make io.fabric8.kubernetes-client just a complileOnly dependency to minimize size of uber jar

    • #2946 - Expose predict_contributions (SHAP values) for H2OMOJOPipelineModel

    • #3285 - Add Support for Spark 3.2

    • #3275 - Support Java Serialization of NullableDataFrameParams on H2OMOJOModel

  • Engineering Story

    • #2954 - Update Repository Key In Spark R Docker Files

v3.36.0.2-1 (2022-01-27)

Downloads:

  • Improvement

    • #2949 - Upgrade to H2O 3.36.0.2

    • #2941 - Make unwrapMojoModel() Independent on Spark Runtime

    • #2902 - Display Model After Its training Phase on stdout

  • Docs

    • #2945 - Add Comment to Documentation about Contributions Support only in Binomial and Regression Models

  • New Feature

    • #2928 - Expose “cv_scoring_history”, “reproducibility_information_table” on H2OMOJOModel

v3.36.0.1-1 (2022-01-06)

Downloads:

  • Improvement

    • #2942 - Change Domain Levels to “True” and “False” for Columns Originating in BooleanType

    • #2938 - Upgrade to H2O 3.36.0.1

    • #2904 - Log Progress about Trained models to stout

    • #2901 - Display Warnings Coming from ModelBuilders on stdout

    • #3274 - Rewrite H2OWord2Vec to Inherit from H2OFeatureEstimator

    • #3267 - Upgrade Sparkling Water Snapshot Version to 3.36.0.1-1-SNAPSHOT

  • Bug

    • #2937 - ChicagoCrimeApp example not working

    • #2912 - Target column (boolean) is treated as numeric, makes classification become regression

  • Engineering Story

    • #2936 - Fix Databricks Smoke Tests

    • #2919 - Snyk Security Vulnerability Scanning Integration

    • #3183 - Remove Deprecated Parameter withDetailedPredictionCol from MOJOSettings

  • Docs

    • #2932 - Migrate H2ORuleFit tutorial from H2O documentation to SW

  • New Feature

    • #2929 - Expose “start_time”, “end_time”, “run_time”, “default_threshold” on H2OMOJOModel

    • #2906 - Expose Fields of Model Output on H2OMOJOModel Classes as Getters

v3.34.0.7-1 (2021-12-22)

Downloads:

  • Engineering Story

    • #2933 - Move Removal of Items from Namespace org.apache.spark.h2o to 3.38

    • #2934 - Move Removal of Certain Deep Learning Parameters from 3.36 to 3.38

    • #2930 - Remove pypandoc Version Fix

    • #2925 - Increase Timeout for SW CI Pipelines to 10h

  • Improvement

    • #2931 - Upgrade to H2O 3.34.0.7

v3.34.0.6-1 (2021-12-17)

Downloads:

  • Improvement

    • #2926 - Upgrade to H2O 3.34.0.6

    • #2922 - Upgrade to H2O 3.34.0.5

    • #2924 - Add instance of structured streaming into sparkling water examples

  • Engineering Story

    • #2927 - Use pypandoc 1.16.4 during Execution of Tests

    • #2916 - Add Roadmap for Q4-2021/Q1-2022 to README.rst

  • Docs

    • #2918 - Remove Invalid Parameters from DAI MOJO Documentation

    • #3227 - Fix and Update Tutorial for GCP Dataproc

v3.34.0.4-1 (2021-11-19)

Downloads:

  • Improvement

    • #2914 - Upgrade to H2O 3.34.0.4

    • #2911 - Deprecate Apache Spark 2.2

    • #2909 - Add Missing Scala Setters for ‘spark.ext.h2o.extra.cluster.nodes’ Property

    • #2908 - Upgrade MOJO runtime to 2.7.5

    • #2907 - InternalBackend Should Set IP Address Explicitly to H2O Node

    • #2905 - Improve Exception when AutoML Does Not Return Any Model after Its Training Phase

  • Bug

    • #2903 - Sparkling water compiled with Scala 2.12.10 doesn’t work running on Scala 2.12.13+

    • #3286 - Make H2OMOJOModel.load Independent on Scala Version

  • Engineering Story

    • #3276 - Activate the MOJOModel offset tests (and maybe improve those?)

    • #3277 - Add More Benchmarks for conversion from Dataframe to H2OFrame

    • #3278 - Remove anaconda Package from Testing Image

v3.34.0.3-1 (2021-10-08)

Downloads:

  • Engineering Story

    • #3279 - Upgrade SW Version to 3.34.0.3-1-SNAPSHOT

    • #3282 - Fix Tests to Consider More Stacked Ensemble Models in AutoML Leaderboard

    • #3263 - Enable Publishing of api-generation Project

    • #3264 - Change K8s Base Image for Spark 3.0, 3.1 to openjdk:11-jre-slim-buster

    • #3247 - Migrate SW Automated Tests to CDH 6.3

  • Improvement

    • #3280 - Upgrade to H2O 3.34.0.3

    • #3261 - Deprecate autoencoder Parameter on H2ODeepLearning

  • Bug

    • #3283 - Improve Zip Archive Check in Pysparkling Initializer

  • New Feature

    • #3284 - Make Maximum Size of Requests and Responses on Flow UI Proxy Configurable

    • #3262 - Add Support for Python 3.7, 3.8

v3.34.0.1-1 (2021-09-16)

Downloads:

  • Engineering Story

    • #3265 - Fix Deletion of K8s Images in Release Pipeline

    • #3266 - Change K8s Base Image for Spark 2.4 to openjdk:8-jdk-slim-buster

    • #3237 - Remove Python Dependency on Colorama

    • #3760 - Remove deprecated setClientExtraProperties, setNodeExtraProperties, clientExtraProperties, nodeExtraProperties and related spark options

    • #3762 - Remove deprecated setClientBasePort, setNodeBasePort, clientBasePort, nodeBasePort and related Spark configuration

    • #3764 - Remove Deprecated spark.ext.h2o.client.flow.dir Option

    • #3767 - Remove deprecated setH2OClientLogDir, setH2ONodeLogDir, h2oClientLogDir, h2oNodeLogDir and related spark options

  • Improvement

    • #3268 - Upgrade to H2O 3.34.0.1

    • #3269 - Update AutoML Tests to Consider 3 StackEnsemble Models in Leaderboard

    • #3270 - Remove Support for Spark 2.1

    • #3252 - Remove Cross-validation-related Parameters from AutoEncoder

    • #3225 - Delete Binary Models after MOJO Download

    • #3114 - Remove Deprecated Parameter distribution on H2OGLM

    • #3109 - Remove Deprecated Parameter weightCol on H2OKmeans

    • #3878 - Remove deprecated mapperXmx getter an setter in favor of externalH2OMemory

    • #3773 - Remove deprecated setH2OClientLogLevel, setH2ONodeLogLevel, h2oClientLogLevel, h2oNodeLogLevel and related spark configurations

    • #3734 - Remove deprecated setClientIcedDir, setNodeIcedDir, clientIcedDir and nodeIcedDir and related spark option

  • New Feature

    • #3241 - Expose Cross Validation MOJO Models on H2OMOJOModel in Python

    • #3242 - Expose Cross Validation MOJO Models on H2OMOJOModel in Scala

    • #3243 - Expose Model Metrics as Objects on H2OMOJOModel in R API

    • #3244 - Expose Model Metrics as Objects on H2OMOJOModel in Python API

    • #3246 - Expose Model Metrics as Objects on H2OMOJOModel in Scala API

    • #3249 - Expose “cross_validation_metrics_summary” on H2OMOJOModel

    • #3229 - Expose AutoEncoder as SW Estimator

    • #3124 - Add H2O RuleFit to Algo API

    • #3545 - Expose PCA as SW Feature Estimator

    • #3546 - Add H2O GLRM to Algo API

  • Docs

    • #3245 - Generate Documentation for All Possible Metrics Classes on H2OMOJOModel

v3.32.1.6-1 (2021-08-20)

Downloads:

  • Improvement

    • #3254 - Upgrade to H2O 3.32.1.6

  • Bug

    • #3256 - Fix Version Check in sparkling-env.sh Script

    • #3257 - Algorithms Supporting Cross-validation Must Remove Fold Column from the List of Features

  • Engineering Story

  • #3253 - Fix booklet build for Spark 2.4

  • New Feature

    • #3258 - Add RMSLE and MAE to model metric maps

v3.32.1.5-1 (2021-08-06)

Downloads:

  • New Feature

    • #3259 - Add ‘mean_per_class_error’ to model trainings map

    • #3230 - Expose H2O-3 Mojo Model on H2OMOJOModel in Scala

  • Improvement

    • #3260 - Upgrade to H2O 3.32.1.5

  • Bug

    • #3231 - SW K8s External Backend Won’t Start If Number of Nodes is Greater than 2

    • #3232 - Conversion Method asH2OFrame Throws Exception When an Input Contains a Column Named “na” or “null”

    • #3238 - Fix interactionConstraints on H2OXGBoostMOJOModel in Python API

    • #3239 - Fix getMonotoneConstraints() on H2OGBM and H2OXGBoost MOJO model.

  • Docs

    • #3233 - Fix Link in Overivew of Examples

v3.32.1.4-1 (2021-07-15)

Downloads:

  • Bug

    • #3234 - Fix Building of RSparkling Docker Images

  • Engineering Story

    • #3235 - Upgrade to H2O 3.32.1.4

    • #3236 - Upgrade Spark in Testing Docker Image to 3.0.3

    • #3220 - Get AutoML Python Tests Alligned with PUBDEV-8175

    • #3223 - Upgrade Spark in Testing Docker Image to 3.1.2 and 2.4.8

  • Docs

    • #3222 - Add example of spark.ext.h2o.flow.extra.http.headers

    • #3224 - Fix CoxPH example for Scala and Python

v3.32.1.3-1 (2021-05-27)

Downloads:

  • Improvement

    • #3226 - Upgrade to H2O 3.32.1.3

  • Engineering Story

    • #3210 - Fix Deployment of Testing Infrastructure for K8s Tests

  • New Feature

    • #3211 - Expose all H2OMOJOModels from AutoML Leaderboard

    • #3212 - Expose Scoring History and Variable Importances on H2OMOJOModel

v3.32.1.2-1 (2021-05-04)

Downloads:

  • Engineering Story

    • #3213 - Upgrade dbplyr in SW Testing Docker Image

    • #3215 - Upgrade “setuptools” during the build of testing docker image

  • Improvement

    • #3216 - Upgrade to H2O 3.32.1.2

    • #3218 - FinalizeFrame should log information about Frame

  • New Feature

    • #3217 - Expose Blending Frame on H2OAutoML

    • #3200 - Introduce Configuration Property for Setting CA Certificates in Pysparkling

    • #3201 - Add ability to use old method for number of instances recognized during launch (for IBM SC)

    • #3206 - Expose Leaderboard Frame (setLeaderboardDataFrame()) for AutoML

    • #3193 - Add Support for Spark 3.1

  • Bug

    • #3219 - Delete Train and Validation Frame after MOJO Model is Downloaded inside H2OAutoML.fit()

    • #3202 - Fix Memory Leak of Frames in H2OAutoml

    • #3205 - Target Encoder Throws Exception on Empty List of Input Columns

  • Docs

    • #3203 - Fix Tutorial for H2OGAM

    • #3204 - Add Tutorial for H2ODeepLearning

v3.32.1.1-1 (2021-03-30)

Downloads:

  • Improvement

    • #3207 - Upgrade to H2O 3.32.1.1

    • #3209 - Reflect Changes on GAM According to PUBDEV-7860

    • #3131 - Extend H2O Client Deprecation to 3.36

    • #3108 - Remove Deprecation of getTrainingParams on H2OMOJOModel

  • Engineering Story

    • #3208 - Extend Deprecation of withDetailedPredictionCol to 3.36

  • New Feature

    • #3198 - Extend Target Encoder to Multinomial Problems

    • #3185 - Expose Interactions on Target Encoder

    • #3179 - H2OMOJOPipelineMOJOModel can Produce Predictions of Various Types

    • #3169 - Scoring Package for Scala

    • #3170 - Python Scoring Package

    • #3128 - Extend Target Encoder for Regression Problems

    • #3115 - Add H2O CoxPH to Algo API

  • Bug

    • #3182 - GLM Model Trained via AutoML Throws Exception when Contributions Enabled

  • Docs

    • #3188 - Add licensing information to docs

    • #3164 - Mention Scoring Packages in Sparkling Water Documentation

v3.32.0.5-1 (2021-03-18)

Downloads:

  • Improvement

    • #3190 - Upgrade to H2O 3.32.0.5

  • New Feature

    • #3191 - Disable SSL Certificate Verification in Python Client and Spark Instances Separately

  • Bug

    • #3192 - The getGridModelsMetrics() and getGridModelsParams() Methods Do Not Name Columns Correctly

    • #3195 - Fix Handling of Flow UI SSL Configuration

  • Engineering Story

    • #3194 - Update Spark in Docker Images to 3.0.2

    • #3180 - Enable Beta Constraints Tests On H2OGAM and H2OGLM

  • Docs

    • #3196 - Update docs to reflect correct ice dir call for 3.30

    • #3181 - Fix Imports in Documentation Sample for Pipeline MOJO

v3.32.0.4-1 (2021-02-02)

Downloads:

  • Docs

    • #3186 - Upgrade Links in readme.md to Documentation for Spark 3.0

    • #3187 - Remove Documentation Badge From Redme.md

    • #3161 - Document Properties for running SW on EMR 5.32

  • Improvement

    • #3189 - Upgrade to H2O 3.32.0.4

  • Engineering Story

    • #3172 - Fix Flaky Test in AnomalyPredictionTestSuite

    • #3178 - Stop Publishing 32bit Artifacts to Conda Repository

    • #3165 - Increase Limit of K8s Tests for Automatic External Backend

  • Bug

    • #3174 - Fix TargetEncoder MOJO for Distributed Environment

    • #3176 - Fix TargetEncoder for Usage in Python Pipeline

    • #3177 - Delete Train and Validation H2O Frame after Training a Model

v3.32.0.3-1 (2020-12-30)

Downloads:

  • Improvement

    • #3166 - Upgrade to H2O 3.32.0.3

  • Engineering Story

    • #3167 - Set Seed in AnomalyPredictionTestSuite

    • #3168 - Fix Python Isolation Forest Test after H2O Changes

    • #3160 - Temporarily Disable Beta Constraints Tests

    • #3151 - K8s Tests Should Transform Datasets in a Distributed Way

    • #3154 - Add Branch Name to Nightly Artefact on DockerHub

    • #3155 - Proper Removal of Sparkling Water Images from Local Docker Registry

    • #3146 - Rename Driver Pods to Fix K8s Tests in Client Mode

    • #3148 - Enable Generation of Dependency License Report as CSV

    • #3118 - Fix Deployment of Kubernetes Tests on Jenkins

  • Bug

    • #3157 - Tried using port 54321 for Flow proxy, but port was already occupied

    • #3158 - Fix Propagation of randomLink and randomFamily to MOJOModel Properties

    • #3159 - ClassSamplingFactors Parameter Throws Exception During Deserialization

    • #3162 - Try to Lock Cloud Multiple Times

    • #3152 - Fix the Flow link for DBC (Azure (latest Runtime))

    • #3145 - Loading of Pipeline Containing SW Stage Throws NPE

  • Docs

    • #3153 - Fix K8s Examples in Sparkling Water Documentation

  • New Feature

    • #3149 - Keep node-specific API open despite K8S API shutdown

v3.32.0.2-1 (2020-11-19)

Downloads:

  • Improvement

    • #3150 - Upgrade MOJO runtime to 2.5.3

    • #3139 - Deprecate Apache Spark 2.1

    • #3140 - Upgrade to H2O 3.32.0.2

    • #3130 - Remove xmxMapper from Examples in Documentation

    • #3125 - Proper Locking of H2O Frames during Conversion from Spark Data Frames

  • New Feature

    • #3142 - Expose Interaction Constraints on H2OXGBoost

  • Engineering Story

    • #3144 - Fix Publishing of Nightly Build Images to DockerHub

    • #3133 - Show Stack Trace of Exceptions in Failed Tests

    • #3134 - Run Databricks Automated Tests on ML Runtime Versions

    • #3135 - Replace IcedHashMapWrapper with New guessType Method On PreviewParseWriter

    • #3126 - Enable to Run Python Tests with SW Runtime Individually

  • Bug

    • #3136 - asH2OFrame Could Fail on ArrayIndexOutOfBoundsException

    • #3138 - Fix Monotone Constraints on GBM and XGBoost MOJO Model

    • #3132 - Fails to Convert Categorical Columns on Big Dataset and Identity Column

    • #3127 - Fix Publishing of SW Booklet

    • #3120 - Fix HamOrSpam Python Integration Test

    • #3099 - Make Sparkling Water Runnable on Databricks ML Distributions

  • Docs

    • #3129 - Update Sparkling Water MOJO Deployment Documentation

v3.32.0.1-2 (2020-10-15)

Downloads:

  • Improvement

    • #3122 - Move Ping Messages to Debug Logging Level

    • #3116 - Upgrade to H2O 3.32.0.1

    • #3103 - Remove “max_hit_ratio_k” from the List of Deprecated Parameters

    • #3092 - Deprecate ‘trainingParameters’ Method on H2OMOJOModel

    • #3094 - Deprecate ‘weightCol’ Parameter on H2OKmeans

    • #3095 - Deprecate ‘distribution’ Parameter on H2OGLM

    • #3088 - Limit Generated Parameters Only to parameters in xxxParameteV3.fields

    • #3089 - Upgrade to a Docker Image with Spark 2.4.7 and 3.0.1

    • #3085 - Remove Irrelevant Parameters from Kmeans API

    • #3082 - Clean up w2v tokenizer and expose minTokenLength and pattern to provide same features as H2O tokenize method

    • #3068 - Automatically generate LaTex configuration from Scala code

    • #3061 - Automatically generate R configuration from Scala code

    • #3059 - Automatically generate the configuration table in documentation

    • #3040 - Update booklet to the state so it is valid on our master branch

    • #3873 - Delete H2OFrames Produced by Algorithm Parameters

  • Bug

    • #3123 - org.apache.spark.h2o.H2OConf Shouldn’t Override Settings from Command Line

    • #3112 - Newly Introduced Parameter ‘preprocessing’ Broke SW API Generation

    • #3105 - Add missing ‘ in the migration guide

    • #3106 - asH2OFrame Method Could Fail on a String Column Having More Than 10 Million Distinct Values

    • #3107 - The Method getAlgo() on H2OGridSearch Supports only a Subset of Algorithms

    • #3100 - Add logic of FrameUtils.guessParserSetup to Sparkling Water

    • #3097 - Missing Import of H2OBinaryModel in Python Classification and Regression Classes

    • #3093 - Use family Parameter on H2OGLM, H2OGAM for Determining a Need to Convert the Label Column to Categoricals

    • #3084 - Fix parameter generation in doc

    • #3083 - Kubernetes tests should clean up the environment in case of an error

    • #3079 - NullableDataFrameParam Should Be Persistable

    • #3064 - Missing mappings for ‘negativebinomial’ and ‘fractionalbinomial’ in ProblemType.distributionToProblemType

    • #3882 - Throw explicit exception in case hyper parameter does not exist

  • New Feature

    • #3117 - Expose Feature Types on H2OPipelineMOJOModel and H2OMOJOModel

    • #3096 - Generate Algorithm-specific Python MOJO Classes

    • #3076 - Generate Algorithm-specific MOJO Scala Classes

    • #3072 - Add Isolation Forest to GridSearch

    • #3075 - Add H2O Isolation Forest to Algo API

    • #3069 - Add renameCol method to asH2OFrame Scala API

    • #3052 - Add H2O GAM to Algo API

    • #3030 - Expose Reconstructed Columns on DimReduction Predictions

    • #3029 - Expose Reconstruction Mean Squared Error on AutoEncoder Prediction

    • #3880 - Expose Stage Results/Probabilities on MOJO Detailed Prediction

    • #3865 - Expose H2OBinaryModel in Sparkling Water ( without methods so far)

    • #3879 - Expose beta_constraints on GLM

    • #3841 - Expose calibration_frame on GBM, DRF, XGBoost

    • #3844 - Expose ‘random_columns’ on GLM

    • #3847 - Expose interaction_pairs on GLM

  • Docs

    • #3113 - Add Comments to the Documentation Indicating Whether a Parameter is Exposed on MOJO or Not

    • #3101 - Update Documentation with Usage of Algorithm-specific MOJO Classes

  • Engineering Story

    • #3876 - Fix intermittent error during stopping kubernetes tests

v3.30.1.2-1 (2020-09-08)

Downloads:

  • Bug

    • #3091 - H2OMOJOModel.load Method Throws Exception

    • #3080 - Fix Propagation of Monotonous Constraints

  • Improvement

    • #3086 - Upgrade to H2O 3.30.1.2

    • #3078 - Upgrade MOJO runtime to 2.4.8

  • Docs

    • #3087 - Improve Documentation for XGBoost Memory Requirements

    • #3077 - Make Documentation More Descriptive about Extraction of pipeline.mojo from mojo.zip

v3.30.1.1-1 (2020-08-12)

Downloads:

  • Improvement

    • #3070 - Enable ‘detailed_prediction’ Column for MOJO Predictions by Default

    • #3046 - Upgrade H2O 3.30.1.1

  • Bug

    • #3060 - Put sparkVersion into resulting gradle.properties

    • #3057 - Doc: Multinode Xgboost is no longer experimental in AutoML

    • #3058 - Fix typos in documentation

    • #3050 - Doc: spark.ext.h2o.cloud.representative needs to point to leader node

    • #3045 - Deprecate removed XGBoost options

  • Epic

    • #3037 - Update booklet to the state so it is valid on our release branch

v3.30.0.7-1 (2020-07-24)

Downloads:
  • Bug

    • #3862 - Fix timeout on long running Rest API commands through Proxy

    • #3861 - Some tests in ml package are not being run

    • #3883 - R tests do not fail in gradle when there is failed test

    • #3036 - Missing getWithLeafNodeAssignments() Methods on MOJOModelBase In Python & R

    • #3041 - Double Usage of Parenthesis in H2OMOJOBase.py

    • #3043 - Store the scalaBaseVersion into resulting gradle.properties file

  • New Feature

    • #3885 - Update H2O to 3.30.0.7

  • Improvement

    • #3710 - DBC Smoke tests

    • #3869 - Enable leaf node assignment for H2OMOJOModel

    • #3881 - Documentation should mention how to run examples with Spark/Sparkling Shell

    • #3034 - Copy Sparkling Water booklet from H2O-3 repo to Sparkling Water

    • #3038 - Expose option used for waiting before the clouding starts in internal backend

  • Engineering Story

    • #3863 - Fix build after hive changes on the H2O side

  • Docs

    • #3864 - Flip Sparkling Water site when released

v3.30.0.6-1 (2020-07-03)

Downloads:
  • Bug

    • #3980 - Flow UI Scala Repl: use paste mode to interpret commands passed through Flow UI

    • #5578 - Fix misleading error message from incompatible Java version

    • #3822 - Intermittent failure of ai.h2o.sparkling.backend.exceptions.RestApiCommunicationException at ScalaInterpreterServletTestSuite.scala:28

    • #3875 - Fix MOJO Model Predictions on Dataframes with ArrayType or Vector

    • #3874 - Fix mojo test -> H2O added a new parameter and therefore number of parameters does not match now

    • #3870 - Fix Terraform issue with matching multiple VPCs

  • New Feature

    • #3834 - Create Sparkling Py4j Getaway

  • Improvement

    • #3872 - Upgrade H2O to 3.30.0.6

  • Engineering Story

    • #3877 - Fix intermittent HamOrSpam failure on AutoML

v3.30.0.5-1 (2020-06-22)

Downloads:
  • Bug

    • #3835 - Fix org.apache.hadoop.fs.FsUrlConnection cannot be cast to java.net.HttpURLConnection

    • #3831 - H2O Flow Proxy is not stopped as part of hc.stop() call

    • #3828 - Ensure that endpoints on Spark driver respect authentication options

    • #3826 - JsonSyntaxException when using setNfolds() on algorithm

    • #3824 - Flow proxy is broken when https is used

    • #3851 - Ensure we do not skipped available ports

    • #3850 - Improve check for version ( do not call external stop in case of internal backend)

    • #3849 - Unit tests fail on Spark 3.0 -> randomSplit gives different result on Spark 3.0 and Spark 2.4 and lower

    • #3848 - HashingTF uses different hashing function since Spark 3. Use the old one in tests

  • Improvement

    • #3809 - Ensure that all requests to backend cluster goes via leader node

    • #3840 - Exclude Content of site/.doctrees from SW Distribution Archive

    • #3839 - Update mojo pipeline doc

    • #3838 - Move ml related tutorials to ML sub-page in our doc

    • #3837 - Document output of DAI mojo better

    • #3832 - Ensure that call /3/Shutdown handles H2OContext stop in case of Sparkling Water ( via Flow Proxy)

    • #3823 - Failed H2O Job should Report Exception and StackTrace

    • #3821 - Use leader node from the beginning of Rest API communication

    • #3860 - Replace "External H2O Node" with just "H2O Node" as the code is now used in both backends

    • #3856 - Uprade shadowjar plugin to 6.0.0 ( fixes deprecation warnings)

    • #3853 - Add support for spark 3.0

    • #3852 - Upgrade H2O to 3.30.0.5

  • Engineering Story

    • #3748 - ScalaInt, DataFrames, H2oFrames and RDDS endpoints must be handled on Spark driver side as we require both Spark and h2o features

    • #3803 - Enable to Run Benchmarks from Local PC by Passing VPC and Subnet

    • #3830 - Upgrade to Spark 2.4.6

    • #3829 - Upgrade Gradle to 6.5

    • #3846 - Update spotless to 4.4.0

    • #3845 - Update release plugin to 2.8.1

v3.30.0.4-1 (2020-06-04)

Downloads:
  • Bug

    • #3777 - Missing LogUtil class on external h2o backend

    • #3817 - The .getAlgo() Method of Pysparkling H2OGridSearch Throws Exception

    • #3816 - Update GridSearch Documentation

    • #3804 - java.lang.IllegalArgumentException: requirement failed: The auto-closable resource can't be null!

  • Improvement

    • #5428 - Structure Contributions in the 'detailed_prediction' Column as MapType

    • #3784 - Warn user of upcomming change in grid search in 3.32

    • #3781 - Document hive support in non-kerberized environments

    • #3780 - Rename(Deprecate) setHiveSupportEnabled to setKerberizedHadoopEnabled or similar equivalent

    • #3778 - Deprecate GridSearch Parameters which Are Exposed also on Algorithms

    • #3812 - Treat sphinx warnings as errors as they usually mean doc is broken

    • #3811 - Upgrade H2O to 3.30.0.4

    • #3810 - Switch right join implementation to use H2O

v3.30.0.3-1 (2020-05-14)

Downloads:
  • Bug

    • #4024 - SW Runtime is complaining about missing SPARK_HOME during version check

    • #5521 - Sparkling water fails to detect newer version of colorama

    • #3508 - java.lang.Long cannot be cast to org.apache.spark.sql.Row from PySparkling

    • #3528 - asH2OFrame does not work on dataset with primitive values

    • #3720 - Nightly builds fail on SW version check

    • #3740 - Fix doc warninigs on hive site

    • #3738 - Fix documentation for download logs from DBC cluster

    • #3736 - Update Plan Contains HostNames instead of IP Addresses

  • Improvement

    • #3527 - Deprecate JavaH2OContext

    • #3726 - Document withDetailedPrediction on mojo deployment page

    • #3721 - Ensure H2OContext can be created in PySparkling without numpy installed

    • #3754 - deprecate spark.read.h2o and spark.write.h2o to be consistent with python api

    • #3739 - Upgrade to H2O 3.30.0.3

  • Engineering Story

    • #3742 - Remove Usages of Deprecated MojoPipelineReaderBackendFactory

v3.30.0.2-1 (2020-05-04)

Downloads:
  • Bug

    • #3508 - java.lang.Long cannot be cast to org.apache.spark.sql.Row from PySparkling

    • #3675 - Fix class not found org.spark_project.jetty.util.thread.ThreadPool error

    • #3711 - Fix link to jira in README

    • #3708 - The AWS java sdk s3 in SW throws the exception: java.lang.IllegalStateException: Socket not created by this factory. Have tried with spark 2.4 version and sparkling water versions -3.28.1.2-1-2.4 and -3.30.0.1-1-2.4.

    • #3707 - Add numpy to Python Kubernetes Image

    • #3704 - Shadow scala-compat

    • #3702 - Don't do version check in case user is using databricks-connect

    • #3701 - getFeaturesCols in python returns scala obj

    • #3699 - Context Path is Erased From Rest Calls

    • #3695 - Context Path Must be Also Considered on H2O Worker Nodes

  • New Feature

    • #3700 - Expose spark.ext.h2o.allow_insecure_xgboost parameter

  • Improvement

    • #3579 - Introduce method asSparkFrame on H2OContext.scala and deprecate asDataFrame

    • #3706 - Update description of spark.ext.h2o.external.cluster.size in SW Documentation

    • #3697 - Deprecate setH2OClientLogLevel and setH2ONodeLogLevel methods

    • #3729 - Distribute Mojos via SparkFiles to Avoid Maximum Array Size Limit

    • #3728 - Upgrade to H2O 3.30.0.2

    • #3726 - Document withDetailedPrediction on mojo deployment page

  • Engineering Story

    • #3690 - Replace Expected Types with Enumeration

    • #3679 - Switch test infra to aws

    • #3694 - Document How to Import Hive Data in Kerberized Environment

v3.30.0.1-1 (2020-04-06)

Downloads:
  • Bug

    • #3553 - Start H2OContext on python side if the user didn't explicitly ask for it

    • #3595 - java.lang.UnsupportedOperationException: JsonObject on testGetLeaderboardWithVariableArgumens(hc, dataset)

    • #3590 - InternalH2OBackend Shouldn't Call setH2OCluster

    • #3637 - Update getOrCreate method after enabling rest api in py/r as well

    • #3663 - Fix compile of micro benchmarks

  • New Feature

    • #5381 - Switch PySparkling in external backend to client-less approach by default

    • #3645 - Introduce Update Plan Reflecting the Final Layout of H2O Chunks

  • Improvement

    • #4368 - Switch to scala formatter

    • #3313 - Change Spark DataFrame to H2OFrame Conversion to Implicitly Convert String Columns to Categoricals

    • #3349 - Switch PySparkling & RSparkling in internal backend to client-less approach by default

    • #3348 - Remove deprecated r2stopping parameter on GBM and DRF

    • #3355 - Switch RSparkling in external backend to client-less approach by default

    • #3377 - Remove deprecated methods in RSparkling, from release 3.30 only instance methods should be used

    • #3393 - In case of rest api, train via rest API, not on the driver

    • #3408 - Remove deprecated nEstimators field and related methods on H2OXGBoost

    • #3410 - Remove deprecated methods in ExternalBackendConf.py

    • #3427 - Remove deprecated methods in InternalBackendConf.py

    • #3425 - Remove deprecated methods in SharedBackendConf.py

    • #3423 - Remove deprecated method setSparkVersionCheckEnable in SharedBackendConf.scala

    • #3530 - Remove kwargs argument from H2oContext.getOrCreate in python and deprecated verify_ssl_certificates arg handling

    • #3525 - Remove deprecated leaderboard method

    • #3517 - Remove deprecatd as_h2o_frame and as_spark_frame on H2OContext.py

    • #3536 - Remove deprecated download_h2o_logs method on H2OContext.py

    • #3534 - Remove deprecated get_conf method on H2OContext.py

    • #3533 - Simplify asH2OFrame in PySparkling

    • #3547 - Cleanup: Remove extra repl classes for different spark versions

    • #3568 - Remove exactLambdas param from H2OGLM

    • #3563 - Remove deprecated setClusterConfigFile from H2OConf

    • #3562 - Remove deprecated setClientPortBase from H2OConf

    • #3558 - Remove deprecated get_grid_models, get_grid_models_params and get_grid_models_metrics params from H2OGridSearch

    • #3577 - Remove deprecated initial_score_interval on H2OXGBoost

    • #3575 - Remove deprecated h2oNodeWebEnabled and associated setters

    • #3571 - Remove deprecated _score_interval on H2OXGBoost

    • #3588 - Remove deprecated learn_rate_annealing on H2OXGBoost

    • #3587 - Remove deprecated option to disable web on client node

    • #3582 - Improve and test getCurrentMetrics on H2OModel

    • #3593 - Remove REST API & client h2ocontext, make rest api the default one

    • #3592 - Set H2O Cluster Time Zone only via REST

    • #3600 - Move api classes to ai.h2o.sparkling package

    • #3611 - Move classes from org.apache.spark.h2o.utils to ai.h2o.sparkling

    • #3610 - Deprecate MetricsSupport and make it possible to obtain training metrics on H2OMojoModel

    • #3608 - Expose only H2OFrame, hide remaining internal API

    • #3622 - Switch H2OFrameSupport methods to use rest api

    • #3621 - Remove unused classes & move a few remaining classes to a ai.h2o.sparkling

    • #3620 - Remove NetworkBridge as the method isInetAddressOnNetwork is now public in H2O

    • #3619 - Remove and replace remaining reference in examples tests

    • #3618 - Reformat code up-to current standard everywhere except core to avoid formatting issues when cherry-picking

    • #3617 - Fix formatting in core ( the whole scala/java codebase now has consistent formatting)

    • #3625 - Remove standalone tests from codebase as we do not test against standalone cluster

    • #3643 - Remove and replace remaining reference in to H2O algos in SW doc

    • #3647 - Ignore warning report in pytest about converting bit number to string as it is on purpose

    • #3646 - Upgrade gradle python plugin to avoid gradle 6 deprecation warnings

    • #3655 - Deprecate JoinSupport in water.support package and make it part of ai.h2o.sparkling.H2OFrame

    • #3665 - Cleanup in tests -> move to right ai.h2o.sparkling packages

    • #3664 - Add spotless check for ending new lines for most of the other files (the other files do not have better formattes now)

    • #3689 - Remove missleading comment in R's namespace file.

    • #3688 - Remove subproject apps-streaming

    • #3684 - Upgrade H2O to 3.30.0.1

  • Engineering Story

    • #3641 - Add unzipped all headers csv to repo

    • #3640 - Ensure AirlinesDemo runs without the client

    • #3639 - Remove the zeppelin notebook from examples dir

    • #3638 - Move interpreter tests to ScalaCode handler where they belong

    • #3636 - Make HamOrSpam example use SW API

    • #3635 - Make Prostate example use SW API

    • #3634 - Make DeepLearning example use SW API

    • #3633 - Make ChicagoCrimeApp use SW Api

    • #3652 - Deprecate and Disable allStringColumnsToCategorical Option

    • #3644 - Fix deprecation warning about duplicate paths in because of overwriting scala-editor.css file in Sparkling Water

    • #3660 - Remove extra hdpVersion from the codebase

    • #3659 - Proposal: Speed up of integration tests by sharing the context

    • #3658 - Upgrade docker image version to 26

    • #3673 - Apply spotless formatting to gradle files

    • #3672 - Deprecate ModelSerializationSupport

    • #3670 - Remove test classes from package.scala in org.apache.spark.h2o._

    • #3683 - Upgrade gradle + sphinx plugin to latest versions

    • #3682 - Update python plugin to version 2.2.0 ( avoid gradle deprecation warnings)

v3.28.1.3-1 (2020-04-06)

Downloads:
  • Bug

    • #3668 - bin/build-kubernetes-images.sh should get spark version from $SPARK_HOME directory

    • #3667 - Get rid of numpy and pyspark dependency

    • #3666 - Fix initialization of Sparkling Water JAR in case we call import pysparkling and sc is not yet running

  • New Feature

    • #3650 - Enable H2O to Connect to Hive

  • Improvement

    • #3615 - Deprecate hex.ModelUtils.classify in favour of SW Algo API

    • #3613 - Deprecate DeepLearningSupport and GBMSupport in favor of SW Algo API

    • #3651 - Make SW Compatible with Older Versions of Steam

    • #3687 - Upgrade to H2O 3.28.1.3

  • Engineering Story

    • #3599 - Update Comments in ChunkServlet

  • Docs

    • #3648 - In Using the MOJO Scoring Pipeline section, clarify that MOJO Scoring Pipelines are from Driverless AI license

v3.28.1.2-1 (2020-03-19)

Downloads:
  • Bug

    • #3573 - Fix Timezone Handling via Conversions to UTC

    • #3598 - Fix release path on s3

    • #3597 - asDataFrame() Conversion Function Throws Exception on Wide Datasets

    • #3596 - Fix Deployment of Artifacts to Maven Central

    • #3591 - Time shift occurring between spark and h2o frame

    • #3607 - ExternalBackend Converts DateType to Numeric

    • #3603 - H2O Transformers do not sanitize feature columns

  • New Feature

    • #3594 - Propagate Timezone Settings from Spark to H2O

    • #3601 - Upgrade to H2O 3.28.1.2

  • Improvement

    • #3606 - Deprecate Implicit Switch to External Backend when H2OConf.setH2OCluster is Called

    • #3604 - Deprecate ignorePublicDNS option as in 3.30 it is no longer required

  • Engineering Story

    • #3585 - Add More Test Cases into Data Conversion Benchmarks

v3.28.1.1-1 (2020-03-06)

Downloads:
  • Epic

    • #5369 - Replace External H2O writer & reader by rest api

  • Bug

    • #4605 - Loophole in H2O authentication with Sparkling water

    • #3420 - In prediction which do classification, be more explicit about relations about class and probabilities

    • #3449 - Copy extension jar to jars folder in distribution archive

    • #3445 - Ensure credentials are pass to connection before we actually connect

    • #3464 - Improve ChicagoCrime test so it does not block in external backend

    • #3484 - Ensure citibike demo does not use TimeSplit so it does not block external backend

    • #3483 - Fix intermittent failures of "splitFrameToTrainAndValidationFrames with ratio lower than 1.0"

    • #3482 - Ignore failing "splitFrameToTrainAndValidationFrames with ratio lower than 1.0"

    • #3519 - Fix distribution artifact name

    • #3544 - Two notebooks in Databricks can't both connect to same H2O cluster

    • #3543 - Fix getter for autoFlowSsl in H2OConf.R

    • #3540 - Fix compile after removing code for external backend from H2O

    • #3538 - Fix wrong statement in migration guide -> internal_security_conf is not enabled by default

    • #3555 - Fix typo in migration guide

    • #3554 - Benchmarks fail on OperationAborted: A conflicting conditional operation is currently in progress against this resource

    • #3552 - Avoid repeated log messages

    • #3566 - Fix nightly build upload path

  • New Feature

    • #3368 - Read Chunks via REST API

    • #3381 - Be able to compile Sparkling Water with Scala 2.12

    • #3367 - Expose Individual Chunks via REST API

    • #3402 - Replace H2O_EXTENDED_JAR with H2O_DRIVER_JAR in all cases

    • #3415 - Write Individual Chunks via REST API

    • #3434 - Be able to params in rest api methods

    • #3433 - Remove extended jar from the codebase

    • #3498 - Property for Passing Extra Jars to External Backend

    • #3549 - Expose H2OConf Getters and Setters in R

  • Improvement

    • #5501 - Remove Sparkling Water SVM in favor of H2O one

    • #3373 - Replace Apache Http Client with a Client Supporting Request Streaming

    • #3372 - Separate Http Communication Logic and REST API Methods

    • #3396 - Remove deprecated option externalWriteConfirmationTimeout

    • #3405 - Ensure method prepareDatasetForFitting works in Rest API based mode

    • #3404 - Ensure we don't call DKV.put in Fit method on H2OALgorithm in case of rest api

    • #3403 - Ensure method preprocessBeforeFitting works in RestApi Mode

    • #3417 - Fix typo in RestCommunication

    • #3451 - Update remaining documentation with new way how to start external backend

    • #3448 - Put sparkling water assembly jar into jars folder in the distribution archive instead into assembly/build/libs

    • #3444 - Make benchmarks up-to-date with removal of extended h2o jar

    • #3439 - Move stacktrace collector extension to extensions submodule

    • #3463 - Remove extra plugin import in :sparkling-water-extensions

    • #3462 - Use enum for H2OColumn (Rest API)

    • #3457 - Move RestAPIUtils to ai.h2o.sparkling package

    • #3481 - Remove unnecessary ExternalH2OBackend.verifyH2OClientCloudUp(conf, nodes) check

    • #3480 - Remove isRestApiBased defined on ExternalH2OBackend as it is already defined on RestApiUtils

    • #3479 - Remove unused H2OSQLContextUtils

    • #3478 - Move H2OFrame to Sparkling Water

    • #3477 - Move RestCommunication to ai.h2o.sparkling package

    • #3476 - No need to check cluster size in manual cluster mode anymore

    • #3474 - Remove deprecated block size configuration

    • #3472 - Move classess in repl to a new package ai.h2o.sparkling

    • #3471 - Move examples to ai.h2o.sparkling package

    • #3470 - Refactor converters

    • #3494 - Remove materialization via .toList.toIterator in H2ORDD and H2ODaframe on external H2O backend

    • #3490 - Remove H2OFrameUtils bridge as not required anymore

    • #3487 - Deprecate sparkSession and sparkContext argument of H2OContext.getOrCreate()

    • #3511 - Create parameterless methods getOrCreate as we don't need to pass spark anymore

    • #3509 - Remove deprecated enableSSL methods and hide the method

    • #3507 - Remove H2OSecurityBridge

    • #3505 - Publish extensions to maven central

    • #3501 - Remove verbose H2O arg from H2OContext.getOrCreate in python

    • #3524 - Enable compression method for rest api conversions

    • #3516 - Remove rest api client from experimental page in our documentation

    • #3542 - Update experimental doc

    • #3541 - Call print(hc) during first creation of H2OContext.r as in H2OContex.py

    • #3539 - No need to search for client ip in rest api mode

    • #3537 - Deprecate download_h2o_logs on H2OContext.py

    • #3535 - Deprecate get_conf on H2OContext.py

    • #3532 - Expose getDomainValues on H2OMojoModel

    • #3557 - Don't use allow_client flag in rest api mode

    • #3551 - Mention in migration guide that explicit creation of H2OContext is required to run algo

    • #3550 - Extend REST Errors with Details from Server

    • #3548 - Cleanup, Use H2OContext.ensure in right places

    • #3567 - Deprecate exactLambdas parameter in H2OGLM

    • #3565 - Rename the 'setClientPortBase' on H2OConf to 'setClientBasePort'

    • #3564 - Rename 'setClusterConfigFile' on H2OConf to 'setClusterInfoFile'

    • #3560 - Fix doc warning: Could not find any member to link for "org.apache.spark.internal.Logging"

    • #3559 - Fix warning: Could not find any member to link for "IllegalArgumentException"

    • #3580 - Deprecate get_grid_models, get_grid_models_params and get_grid_models_metrics params from H2OGridSearch

    • #3578 - Deprecate initial_score_interval on H2OXGBoost as it's only H2O's internal argument

    • #3576 - Deprecate options for disabling or enabling REST api on H2O worker nodes. It needs to be on because of REST client

    • #3574 - Upgrade to Gradle 6.2.2

    • #3572 - Deprecate _score_interval argument as it's only H2O's internal argument

    • #3570 - Ensure that web on client is always enabled

    • #3589 - Deprecate learn_rate_annealing as it is not yet supported on H2OXGBoost

    • #3586 - Update SW version on rel branch to 3.28.1.1

    • #3584 - Upgrade to H2O 3.28.1.1

  • Engineering Story

    • #3385 - Create H2O Extensions Assembly Jar

    • #3429 - Ignore SVM Tests

    • #3496 - Final move - move external backend classes to a new package

    • #3512 - Tests Covering Spark/H2O Frame Conversions

    • #3504 - Hide internal identifyClientIp method

    • #3523 - Benchmarks are failing on exception "VpcLimitExceeded"

    • #3521 - Add Setter Methods of H2OConf to Documentation of Configuration Properties

    • #3561 - Upgrade to Gradle 6.2.1

v3.28.0.4-1 (2020-02-25)

Downloads:
  • Bug

    • #3421 - Fix kubernetes documentation

    • #3436 - Sequential grid search should be default

    • #3435 - Remove SPARK_LOG_DIR, SPARK_WORKER_DIR and SPARK_LOCAL_DIRS and use default Spark values

    • #3440 - Stacktrace extension needs to be daemon thread

    • #3467 - Fix path to docker-image-tool.sh in build-kubernetes-images.sh

    • #3466 - Improve kubernetes documentation

    • #3465 - Fix python kubernetes image

    • #3461 - Missing super.preProcessBeforeFit call in child class

    • #3456 - Fix nightly publishing after switching to mavenLocal build in our pipelines

    • #3455 - Test conversion in case that H2O is running only on subset of executors

    • #3475 - Syntax warning due to comparison of literals using is

    • #3495 - Get transform and transformSchema method of mojo models aligned

    • #3493 - Fix execution of tests using local-cluster

    • #3491 - Fix intermittent error coming from SparklyR in our tests

    • #3488 - Respect SparkSession of a current environment

    • #3503 - Reference mojo as prostate_mojo.zip instead of prostate.mojo in doc

    • #3529 - Use contextPath instead of context_path

  • New Feature

    • #3392 - Expose H2O's Configuration Parameter '-hdfs_config' in Sparkling Water

    • #3438 - Fix intermittent bug in rest api client tests

    • #3459 - AutoML API: expose the new get_leaderboard function available in other clients (Py+R)

  • Task

    • #5345 - Add test for preemption during as_h2o_frame on high concurrency Databricks cluster like scenario

  • Improvement

    • #3389 - Update Mojo to latest version in Sparkling Water

    • #3447 - Make RSparkling examples up-to-date

    • #3446 - Mention in migration doc that assembly jar location in the distribution archive has changed

    • #3443 - Mention in migration doc that H2OSVM is removed from 3.28.1.1

    • #3442 - Move pipeline prediction test to package ai.h2o.sparkling.ml

    • #3441 - Expose quantile alpha on H2OGBM and H2ODeepLearning

    • #3469 - Warn user that the detailed prediction col format will change starting from the next major release for Binomial, Ordinal & Multinomial prediction

    • #3473 - Deprecate block size configuration

    • #3489 - Upgrade to Gradle 6.2

    • #3485 - Re-enable tests using local-cluster

    • #3515 - Upgrade to new docker image 25

    • #3510 - Deprecate setEnableSSL

    • #3502 - Upgrade to H2O 3.28.0.4

    • #3500 - Deprecate passing arguments via kwargs method in getOrCreate in PySparkling

  • Engineering Story

    • #3453 - Upgrade to Spark 2.4.5

    • #3452 - Upgrade to a Docker Image with Spark 2.4.5

    • #3497 - Remove Compiler Warning in HasQuantileAlpha.scala

    • #3492 - Test PRs only on Spark 2.1 and Spark 2.4

    • #3514 - Be able to prefetech all Sparkling Water dependencies without building SW

    • #3513 - Move spark.ext.h2o.hdfs_conf Property among Properties of Internal Backend

v3.28.0.3-1 (2020-02-06)

Downloads:
  • Bug

    • #5531 - Cloud up of SW fails on EMR

    • #5445 - SparklingWater forms only H2O cluster on Azure only with one node

    • #3362 - Support h2o3 mojo prediction in rsparkling

    • #3378 - Add missing 'rel-' prefix when suggesting correct H2O package to install in R

    • #3399 - Fix Typo in Backends Documentation

    • #3397 - Add Sparkling Water UI tab only in case the UI is enabled

    • #3409 - Use local maven in our test infra instead of –includeBuild

    • #3401 - Fix R tests

    • #3416 - Fix setNthreads method on H2OConf

    • #3411 - is_internal_secure_connections_enabled method needs to be in SharedBackendConf.py

    • #3422 - Fix jenkins pipeline so it can also run PRE_RELEASE_TESTS

  • Improvement

    • #5410 - Expose offset_column in XGBoost

    • #3319 - RSparkling in cran should be dummy code to point to our rsparkling in custom repo

    • #3357 - Ensure H2OContext in RSparkling is a class so we don't have to pass sc to methods asH2OFrame and asDataFrame

    • #3379 - Cleanup package.R in RSparkling

    • #3387 - [Proposal]Rename conversion methods to be consistent with other changes

    • #3391 - Fix ArrayIndexOutOfBoundsException on internal backend

    • #3390 - Remove extra import

    • #3388 - Mention in documentation that High Currency clusters are not yet supported

    • #3398 - Add option to specify full path to hadoop command

    • #3407 - Use ntrees instead of deprecated nEstimators on H2OXGBoost API

    • #3406 - Keep migration guide up-to-date

    • #3414 - Deprecate externalWriteConfirmationTimeout option

    • #3413 - Upgrade to H2O 3.28.0.3

    • #3428 - Make sure getters and setters on python ExternalBackendConf are consistent with scala counterpart

    • #3426 - Make sure getters and setters on python InternalBackendConf are consistent with scala counterpart

    • #3424 - Make sure getters and setters on python SharedBackendConf are consistent with scala counterpart

  • Engineering Story

    • #3386 - Add Tests Covering Scenarios with XGBoost and Offset Column

v3.28.0.2-1 (2020-01-23)

Downloads:
  • Bug

    • #3376 - Fix Examples in LDAP and Kerberos Tutorials

    • #3374 - The Second Call of H2OContext.getOrCreate Throws an Exception

  • New Feature

    • #3336 - Introduce stoppingRounds, stoppingMetric and stoppingTolerance Parameters on GBM, DRF, XGBoost and DeepLearning

    • #3364 - Enable to Specify Number of Partitions of Virtual Datasets Used in Benchmarks

  • Improvement

    • #3328 - Deprecate the r2stopping Parameter on GBM and DRF

    • #3359 - Deprecate using username and password in RSparkling in favor of the spark options used for this

    • #3363 - Remove sctrict version check argument in RSparkling

    • #3361 - Test Spark to H2O Conversions on Big Data

    • #3371 - Make Execution of Individual Backends Configurable in Benchmarks

    • #3375 - Upgrade to H2O 3.28.0.2

  • Engineering Story

    • #3370 - Move Model and Algorithm Tests to 'ai.h2o.sparkling.ml' Namespace

    • #3369 - Iterate over Transformed DataFrame in H2OFrameToDataFrameConversionBenchmark

  • Docs

    • #3360 - Update copyright year in conf.py file to include 2020

v3.28.0.1-1 (2019-12-19)

Downloads:
  • Bug

    • #5596 - [Spark-2.1] Switch minimal java version for Java 1.8

    • #5354 - Run rest api client tests only in external backend mode

    • #5350 - The option "-sw_ext_backend" must be enabled when REST-based client is used

    • #5349 - Try to fix NPE on Spark 2.1, 2.2 and 2.3 related to metadata

    • #5348 - Percentiles Are Not Propagated to Metadata

    • #5347 - Fix uploading nightlies when we build them against H2O branches

    • #5343 - H2OContext.getOrCreate Should Create Only One Cluster in Automatic Mode

    • #3294 - Fix unit_test_utils.assert_h2o_frames_are_identical for Python 2

    • #3305 - Fix documentation warnings

    • #3302 - Fix bug introduced in 3.26.11 - using flatfile for client connection in external backend

    • #3307 - Proper stopping of PySparkling in case of REST API

    • #3310 - Improve exception when the user is not authentification in rest api client

    • #3320 - Fix build with latest AutoML changes

    • #3318 - Make ProxyStarter more robust

    • #3327 - Remove extra pre_create_hook and h2o_connect_hook parameters from H2OContext on PySparkling

    • #3326 - In rest api approach, auto mode, external backend keeps running even if we weren't able to authentificate

    • #3324 - RSparkling doc is missing library(rsparkling) step

    • #3331 - Check that timeout for pinging the backend is always smaller then the timeout for killing the external cluster

    • #3330 - Fix Invalid use of BasicClientConnManager error in client-less approach

    • #3329 - Add wait timeout to ConfigurationPropertiesTestSuite.testNotifyLocalPropertyCreatesFile

    • #3335 - Sparkling Water does not stop automaticaly in client-less mode when hc.stop is not defined

    • #3343 - Client needs to recognize itself as client

    • #3347 - Register shutdown hook after H2O is running to avoid NPE in case app is stopped during start of H2OContext

    • #3350 - Implement retry for rest api requests

  • Epic

    • #5560 - [PySparkling] Client Separation from Spark Driver

  • Story

    • #5448 - Convert PySpark DataFrame to H2OFrame without Client

  • New Feature

    • #5592 - Expose H2O-3 DRF in Sparkling Water

    • #5502 - Add DRF to grid search

    • #5485 - Create gradle task to create pysparkling docker image for kubernetes

    • #5484 - Create gradle task to create rsparkling docker image for kubernetes

    • #5407 - Document how to use Sparkling Water with Kubernetes

    • #5406 - Create script to generate kubernetes docker images

    • #5400 - Remove announcement service as it is not used

    • #5374 - Ensure external backend (rest api) is stopped in automatic mode if the spark app is killed ( avoid zombie clusters)

    • #5372 - Create UDF for Ordinal Predictions

    • #5357 - Add parallelism option to GridSearch

  • Improvement

    • #4314 - Remove Deprecated setters on algorithms which has enum as argument

    • #5593 - Expose Number of Trees in SW MOJO

    • #5571 - Remove deprecated parameter colsampleBytree and related methods from H2OXGBoost

    • #5570 - Switch to single value in the predictionCol and put all the details on the detailedPredictionCol

    • #5544 - Use argumentbuilder to build arguments for the external h2o backend

    • #5518 - Remove deprecated option spark.ext.h2o.external.cluster.num.h2o.nodes and related methods

    • #5516 - Remove algos and features in deprecated org.apache.spark.h2o.ml.algos package

    • #5515 - Remove deprecated option spark.ext.h2o.external.read.confirmation.timeout and related getters and setters

    • #5504 - Remove deprecated getLambda & getAlpha getters and related setters

    • #5503 - Remove deprecated getter and setter SelectBestModelDecreasing on H2OGridSearch

    • #5464 - Create breaking changes document in doc there breaking changes so far in 3.28

    • #5458 - Enable to specify outputCols on H2OTargetEncoder

    • #5435 - Deprecate multicast search for cluster in external backend in manual mode

    • #5415 - Change sw version to include also patch within one h2o version

    • #5412 - Migration guide was missing several changes which are already resolved in 3.28

    • #5392 - Expose Extra Http Headers for H2O Nodes

    • #5368 - Remove deprecated option for ipBasedFlatfile

    • #5367 - Ensure worker nodes have always open Flow in the external Backend

    • #5339 - Reuse stopped field from scala backend

    • #3296 - Hide internal fields in H2OContext

    • #3295 - Move stacktrace extension to our sparkling water package ai.h2o.sparkling

    • #3300 - Get Default Values of AutoML Parameters synchronized with H2O-3

    • #3299 - Avoid Usage of AutoML Deprecations

    • #3303 - Mention that worker nodes in manual modeneed to have rest api available in migration guide

    • #3309 - Add security tests for the client-less approach

    • #3308 - [TEST] Add test for download logs when using rest api client

    • #3312 - Deprecate set_h2o_driver_if, h2o_driver_if and scala counterparts

    • #3311 - Remove deprecated methods deprecated by [#3311]

    • #3317 - Unify passing of authentication information

    • #3315 - Remove deprecated set_user_name and user_name method on H2OConf in Python

    • #3322 - Version checks can be done via rest API in all cases on External backend

    • #3321 - Lock cloud in case of rest api client, auto mode

    • #3325 - Add test for the zombie cluster in client-less approach

    • #3323 - In client-less tests, verify that stopped cluster contains are shutdown correctly

    • #3332 - Move several check threads under single backend heartbeat thread

    • #3334 - Ensure internal communication does not go throug proxy, no reason for it

    • #3333 - Communicate always via leader node

    • #3340 - Upgrade to H2O 3.28.0.1

    • #3339 - Use Spark for logging on client in case of External backend

    • #3338 - Add test for automatic cluster stopping

    • #3337 - Obtaining nodes can be done via rest API in all cases on External backend

    • #3344 - Watchdog can be replaced by rest API on external backend

    • #3346 - Remove support for multicast cloud up in case of external H2O backend in manual standalone (no Hadoop) mode

    • #3345 - Remove out-dated check for duke library. Sparkling Water package is now fat jar so this issue does not exist

  • Engineering Story

    • #5398 - Remove unnecessary Log level change

    • #5364 - Tests for Clientless Conversion from H2OFrames to DataFrames

    • #5363 - Configuration File for Tests against H2O Branch

    • #5356 - Adapt internal Target encoding code to latest changes in H2O

    • #5353 - Tests for Clientless Conversion from DataFrames to H2OFrames

    • #5352 - Disable version check in tests on external backend

    • #5351 - spark.ext.h2o.external.disable.version.check was false also when we needed to run rest api tests

    • #3301 - Move remaining java classes under scala dir

    • #3304 - Add Python GBM Test Running with REST-based H2OContext

    • #3342 - Small refactor of ExternalH2OBackend class

    • #3341 - Fix script and integ tests

v3.26.11 (2019-12-06)

Downloads:
  • Bug

    • #5385 - Fix propagation of internal security conf in Sparkling Water

    • #5365 - Iimplement shutdown hook to ensure H2O will go down on normal stop of Spark

    • #5358 - Fix target encoder multiline doc descriptions

    • #5342 - Don't need to stop worker nodes in internal backend, spark takes care of it as it shutdowns the executors

    • #5341 - externalCommunicationBlockSizeAsBytes is missing on H2OConf in python

    • #5340 - Fix PipelinePredictionTest and Regenerate Reference Results

    • #3291 - Fix deadlock when user explicitly calls hc.stop()

  • New Feature

    • #3297 - Upgrade H2O to 3.26.0.11

  • Improvement

    • #5449 - Run H2O Nodes With Security Parameters

    • #5409 - Expose Offset Column in Supervised Algorithms

    • #5391 - Deprecate spark.ext.h2o.client.flow.extra.http.headers

    • #5379 - Correctness Tests for Usage of 'offsetCol' with H2OGBM

    • #5360 - Ensure the client has full flatfile in external backend

  • Engineering Story

    • #5394 - Upgrade to Gradle 5.6.4

    • #5390 - Put h2o-security package into sparkling water assembly

    • #5388 - Simplify distribution of security files

    • #5386 - Enable client mode in Sparkling Water (needs to be done explictly)

    • #5366 - Deprecate h2oNodeWebEnabled for external backend

    • #5362 - Add job to test against rel and master branch of H2O

    • #5361 - Create nightly job where we build sparkling water against h2o branches

    • #5359 - Ensure benchmarks are not run as part of regular build

v3.26.10 (2019-11-07)

Downloads:
  • Bug

    • #5411 - Fix Propagation of Extra Properties in In Internal Backend

    • #5408 - Fix docker image generation for Scala Sparkling Water in Kuberntes Environment

    • #5399 - MOJO Cache Causes That Scoring Applications Don't Finish When Everything Is Done

    • #5396 - sparkling-water-utils is not published to maven

    • #5395 - Upgrade H2O to 3.26.0.10

v3.26.9 (2019-10-31)

Downloads:
  • Bug

    • #5420 - The getGridModelsParams() Method of H2OGridSearch Returns Incorrect Values for Nested Hyper-Paremeter Types

    • #5418 - Scoring package is not published to nexus

    • #5416 - Docs page is always missing last changelog

  • Improvement

    • #5434 - Retry for conda upload in release pipeline

    • #5421 - Expose Base Port for Worker Nodes in External Backend

    • #5419 - Upgrade to H2O 3.26.0.9

    • #5414 - Enable Users to Specify Extra H2O Parameters

v3.26.8 (2019-10-18)

Downloads:
  • Bug

    • #5426 - Improve Synchronization in H2OMOJOBaseCache

  • New Feature

    • #5432 - Enable Users to Specify Extra Http Headers for H2O Flow as SW Parameter

    • #5429 - Enable Users to Specify Block Size of Communication in External Backend

    • #5425 - Expose Property for Setting Lifetime of MOJOs in Cache

  • Improvement

    • #5427 - Improve Variable Names in the ExternalBackendUtils Class

    • #5424 - Remove Relocation of com.google.protobuf in Assembly Jar

    • #5422 - Upgrade to H2O 3.26.0.8

v3.26.7 (2019-10-11)

Downloads:
  • Bug

    • #5459 - Update Documentation of Deploying SW to Azure HDI

    • #5440 - Ensure that after Cloud size X under Y failure the rest of the external cluster is killed

  • Improvement

    • #5438 - Figure out better way of caching MOJO Pipelines in H2OMOJOPipelineModel transformer

    • #5430 - Improve Performance of Loading Pipeline MOJO Files

v3.26.6 (2019-10-02)

Downloads:
  • Bug

    • #5474 - HamOrSparm tests return false for both predictions in scripts tests

    • #5469 - Fix intermittent NPE in PySparkling with rollups on external backend

    • #5468 - H2OTargetEncoderMOJOModel Returns Wrong Results If Input Cols Are Not Ordered According To Training Dataset

    • #5466 - Intermittent failure during conversion to h2o frame on External backend in PySparking

    • #5453 - Prevent sending empty partitions to external H2O backend

    • #5452 - Fix script test - ham or spam pipeline on grid search

    • #5451 - Pysparkling 2.1 fails on parsing PySpark version

    • #5450 - Revert #5450

  • Task

    • #5475 - Benchmarks: Report Failure if Execution Goes Wrong

  • Improvement

    • #5473 - [Spark2.3]Upgrade to Spark 2.3.4

    • #5461 - Automatically increase client timeout on top of Azure

    • #5456 - Make port 9009 configurable on Azure

    • #5437 - Upgrade H2O to 3.26.0.6

  • Engineering Story

    • #5489 - Enable all TargetEncoder tests

v3.26.5 (2019-09-16)

Downloads:
  • Bug

    • #5520 - Fix typo contribution -> contributions on SHAPLY documentation page

    • #5510 - importing pysparkling in Zeppelin fails on SW for 3.26.3

    • #5507 - Add missing namedMojoOutputParameter to PySparkling Algo constructors

    • #5506 - Remove extra asserts for types already covered by type converters

    • #5505 - Improve H2oGridSearch internal handling of Algo + improve API of ordering

    • #5499 - [BUILD] Use numpy compatible with python 2 and python 3

    • #5495 - Jupyter notebook is unable to start kernel for Spark 2.4

    • #5493 - Deprecate cases in H2OMojoPrediction where the prediction column does not directly contain the predicted value.

    • #5491 - Fix IIOB when using calibrated probabilities on MOJO

    • #5487 - PySparkling fails to parse pyspark version with build number, such as 2.3.1.dev0

    • #5486 - Gradle reported success even though the python test failed

    • #5481 - Fix running python tests by changing the env directly

    • #5480 - Script tests are not being executed correctly - some tests are not being executed

    • #5478 - Fix running python integ tests on external backend

  • Epic

  • New Feature

    • #5542 - Update H2OTargetEncoder according to changes in H2O-3 3.26.0.4

    • #5498 - Upgrade to Spark 2.4.4

    • #5482 - Automatically configure H2OContext in case we run on DBC in order to correctly show Flow

  • Task

    • #4462 - Benchmarks: Configuration for External Backend

    • #4461 - Benchmarks: Code Clean up

    • #5536 - Benchmarks: Jenkins Pipeline

    • #5476 - Benchmarks: Automatic Cluster Shutdown after Timeout

  • Improvement

    • #5526 - Create gradle task to build scala image for kubernetes

    • #5517 - Remove use of deprecated option spark.ext.h2o.external.cluster.num.h2o.nodes

    • #5508 - Make sure H2OGLM API is consistent with others and does not use the labmda_ hack

    • #5617 - Don't use strings to define algo name

    • #5494 - Add a note to documentation informing that terraform template is available only for Spark 2.4

    • #5479 - Upgrade H2O to 3.26.0.4

    • #5477 - Remove examples from assembly jar

  • Engineering Story

    • #5523 - Upgrade version of h2o docker image from 58 to the 64

    • #5519 - Upgrade CI docker image for 19 ( build on latest hadoop docker image hd2.2:64)

    • #5514 - Retrain sms_pipeline_model for PipelinePredictionTest so it uses ai.h2o package

    • #5513 - Generic logic to test parametr passing to algo wrappers on PySParkling

    • #5497 - Upgrade to a docker image with Spark 2.4.4

    • #5492 - Target Encoder Tests Covering Various Order of Columns

    • #5488 - Install Terraform to Docker Image Running Tests

  • Docs

    • #5509 - Add Rule of Thumb for Data Conversion

    • #5496 - Document How to Generate Prediction Contributions from an Existing MOJO

v3.26.3 (2019-08-28)

Downloads:
  • Bug

    • #5611 - Fix bug with setting init on KMeans

    • #5610 - Don't need to start H2O to initialize algo on PySparkling side

    • #5602 - Remove extra argument on H2OAUtoML pyspark wrapper

    • #5597 - Fix wrong statement regarding version in terraform documentation

    • #5595 - Properly apply type checks even for None values on all H2OAutoML parameters

    • #5590 - Check if Jar is already attached to the cluster in Initializer

    • #5589 - Extract zip file into temp directory owned and configured by Spark configured by spark.local.dir

    • #5579 - Rename colsampleBytree to colSampleByTree on H2OXGBoost in Scala & Python

    • #5576 - Fix broken link to Running Sparkling Water on supported platforms in the doc

    • #5573 - latest-stable/doc/deployment/pysparkling_pipeline.html is refering to old ratio and predictionCol parameters

    • #5572 - Update doc for RSparkling on Windows for latest RSparkling changes

    • #5569 - Fix warning 'H2OMOJOSettings' object has no attribute '_java_obj'

    • #5568 - Fix wrong link to latest spark version in main README

    • #5567 - createTempDit in SharedBackendUtils should create temp fiels in the Spark temp dir

    • #5566 - Fix IP based cloud-up on client side

    • #5565 - Fix getters on MOJO models

    • #5564 - Fix broken links in the documentation (poiting to old release or unexistent locations)

    • #5563 - PySparkling cannot parse embedded version.txt file.

    • #5562 - Verify version between H2O external back-end & H2O client on Spark driver

    • #5548 - Fix get-extended-h2o script to reflect new location of sparkling water releases

    • #5546 - Expose driver if, port, port range and extra memory percent configuration for external H2O cluster

    • #5543 - Fix SpreadRDDBuilder not serializable exception

    • #5541 - Move all pysparkling source files into single src dir

    • #5538 - Fix path to external jars generated by ./gradlew extendJar

    • #5534 - Fix obtaining the version when pysparkling is installed via pip

    • #5532 - Use absolute imports in tests as the relative ones are removed in python3

    • #5525 - startWorkerNodes and startClient was returning hostname instead of ip address

  • Story

    • #5559 - Conversion to H2OFrame needs to work without running H2O client

  • New Feature

    • #5533 - GLM no longer use MissingValuesHandling enum from DeepLearning

  • Task

    • #5601 - Update examples/README file

    • #5582 - Benchmarks: Terraform template for running benchmarks in EMR

    • #5547 - Benchmarks: Name of result file should contain backed and master

    • #5537 - Benchmarks: Gradle Task for Execution of Benchmarks

  • Improvement

    • #4391 - MOJO depploymet package

    • #5613 - Expose predict_contributions for H2OMOJOModel

    • #5607 - Deprecate H2OMOJOModel, H2OMOJOPipelineModel and H2OMOJOSettings in the org.apache.spark package

    • #5605 - Deprecate algos and features in org.apache.spark package

    • #5603 - Handle sortMetric param in H2OAutoML the same way as other enums

    • #5598 - Immutable projectName on H2OAUtoML

    • #5594 - Upgrade Terraform Templates to AWS Provider 2.23

    • #5587 - Fix 'ai.h2o:sparkling-water-package_2.11:2.4.13'/'h2o_pysparkling_2.4' conflict on Azure Databricks

    • #5586 - Upgrade to mojo2 library v2.1.3

    • #5585 - Avoid null on AutoML include & exlude Algos params

    • #5584 - Apply type converterts to rest of the PySparkling

    • #5574 - Upgrade to H2O 3.26.0.3

    • #5561 - Upgrade Gradle to Gradle 5.6

    • #5549 - Remove unnecessary read confirmation timeout

    • #5540 - Upgrade default instances in terraform templates to M5.xlarge

    • #5539 - Remove unsupported notebook (referencing dead deepwater)

  • Engineering Story

    • #5612 - Avoid duplication between mojo params and algo params

    • #5608 - Cleanup of PySparkling package -> moving to new package ai.h2o

    • #5577 - Remove unused init_scala_int_session() from PySparkling

    • #5550 - Avoid boiler plate code when introducing new test suite in PySpakrling

v3.26.2 (2019-07-30)

Downloads:
  • Bug

    • #4422 - Restarting h2o cluster makes all Spark Sessions connected to it unusable

    • #4380 - Fix IOOB exception when converting H2OFrame to DataFrame

    • #4378 - Bad quotes in documentation

    • #4377 - Remove extra quote in exception on ExternalH2OBackend

    • #4376 - Fix cloud up in external backend manual mode

    • #4375 - Fix wrong statement in rsparkling documentation

    • #4369 - Fix NPE when reading modelDetails in Mojo

    • #4366 - Use Python formatting for Python in secured_flow.rst

    • #4363 - Fix wrong exception in H2OAutoML sort metric handling

    • #4362 - User setClusterSize instead of deprecated setter in tests

    • #4359 - Nullability tests in DataFrameConverterTest should use data frames with an explicit schema

    • #4346 - Use VectorUDT in RowConverter

    • #4341 - Lower memory requirements in tests

    • #4320 - [Prototype] Switch to using String value on the setters & getters in the ml API on distribution param

    • #4318 - PySparkling can't be started after version change using pysparkling.sh

    • #4312 - Remove missingValuesHandling param from XGBoost wrapper

    • #4305 - It is no longer possible to specify predictionCol :(

    • #4297 - convertInvalidNumbersToNa missing on PySparkling

    • #4296 - Fix setters which accept both int and float

    • #4295 - Fix nullableArrayArray param for pyspakrling

    • #4291 - Use absolute imports as the relative ones are removed in python3

    • #4289 - DatasetWrapper should use withColumn insteadOf withColumns method

    • #4287 - Fix tests after modifying allStringsToCategorical

  • New Feature

    • #4334 - Add Target Encoding to Sparkling Water Python API

    • #4313 - Implement H2OKmeans pipeline wrapper

    • #4304 - Introduce NullableDoubleArrayParam for KMeans

    • #4303 - Documentation of Target Encoder

  • Task

    • #4463 - Benchmarks: Infrastructure for Getting Information about Execution Details

  • Improvement

    • #4549 - Add Target Encoding to Sparkling Water Scala API

    • #4415 - Unify ml package accross rel branches

    • #4408 - Unify jenkins scripts & create gradle profiles

    • #4384 - Single execution path for all spark->h2o frame conversions

    • #4372 - Handle vectors in SparkDataFrameConverter more explicitly

    • #4371 - Specify spark specific source dir per project, so they can differ in subprojects

    • #4367 - Document an example of training AutoML model

    • #4365 - Modify sw_xgboost.rst to use tabs for Python and Scala code

    • #4364 - ML Code simplifications & improvements

    • #4357 - [MAJOR_RELESE] Remove deprecated methods

    • #4347 - Integrate generic conversion logic to data frame conversion to H2O frames

    • #4342 - Improve SNAPSHOT handling

    • #4340 - Jenkins file improvements -> publish nihhtly only if both External & internal test pass for all Spark versions

    • #4338 - Upgrade to H2O 3.26.0.1

    • #4337 - Switch to one version of Sparkling Water

    • #4335 - Upgrade to H2O 3.26.0.2

    • #4329 - Use downloadLogs method from H2O and remove relevant methods on Sparkling Water side

    • #4323 - Fix warninig in python package as SW version no longer starts with spark major version

    • #4322 - Remove duplicate spark version specifier on pysparkling

    • #4317 - Update build SW doc

    • #4316 - Ignore local-cluster failing tests

    • #4315 - Use string representations instead of enums on Pipeline API

    • #4311 - Refactor parameters into supervised & unsupervised

    • #4310 - Create Supervised & Unsupervised Algorithm

    • #4308 - Document H2OKmeans pipeline wrapper

    • #4307 - Refactor params to supervised and unsupervised on PySparklin side

    • #4306 - Put back constructor checks for Enums on PySparkling side ( accidentally removed)

    • #4286 - Rename H2OTargetEncoderMojoModel to H2OTargetEncoderMOJOModel

  • Engineering Story

    • #4381 - Integration test for flattening logic

    • #4373 - Micro benchmark for conversion from a DataFrame to H2OFrame

    • #4355 - Unification of creating header page across different spark versions

    • #4302 - Test passing params to pipeline wrappers of H2O Algos

    • #4301 - No longer need to H2OContext.getOrCreate ini __init__ methods of pysparkling algo wrappers

    • #4300 - Avoid duplicating MojoParams on PySparkling side

    • #4299 - Infrastructure for prediction column with a simple prediction value

    • #4298 - prepare ai.h2o.sparkling structure on PySpakrling side

    • #4293 - Move logic for converting columns to categorical to prepareDatasetForFitting method

v2.1.56, v2.2.42, v2.3.31, v2.4.13 (2019-06-24)

Downloads:

  • Bug

    • #4616 - Add more logging to discover intermittent RSparkling Issue in jenkins tests

    • #4440 - add back to JavaH2OContext method asDataFrame(.., SQLContext) but deprecated

    • #4437 - Remove mention of H2O UDP from user documentation

    • #4436 - Fix wrong doc in ssl.rst -> val conf: H2OConf = // generate H2OConf file

    • #4435 - Model ID not available on our algo pipeline wrappers

    • #4421 - Follow up fixes after RSparkling change

    • #4420 - Use s3-cli instead of s3cmd because of performance reasons on nightlies

    • #4419 - Fix spinx warning

    • #4417 - Fix dist

    • #4416 - Fix dist structure

    • #4414 - Fix missing rsparkling in dist package

    • #4412 - Scaladoc not uploaded to S3 after porting make-dist to gradle

    • #4400 - Fix wrong links on nightly build page

    • #4399 - Explicitly send hearbeat after we have complete flatfile

    • #4398 - sparkling water package on maven should assembly jar

    • #4397 - gradle.properties in distribution contains wrong version

    • #4395 - Rename SVM to SparkSVM

    • #4385 - Minor documentation fixes

  • New Feature

    • #4734 - Upload RSparkling to S3 in a form of R repository

    • #4406 - Introduce logic flatting data frames with arbitrarily nested structures

  • Improvement

    • #4158 - Include all used dependency licenses in the uber jar.

    • #4450 - Bundle Sparkling Water jar into rsparkling -> making rsparkling version dependent on specific sparkling water

    • #4441 - Unify repl acros different rel branches

    • #4433 - Expose jks_alias in Sparkling Water

    • #4432 - Include SW version in more log statements

    • #4428 - Add additional log to H2O cloudup in internal backend mode

    • #4427 - Create local repo with RSparkling

    • #4426 - [RSparkling] Make installation from S3 the default recommended option

    • #4425 - Move the conversion logic from Spark Row to H2O RowData to a separate entity

    • #4424 - Store H2O models in transient lazy variables of SW Mojo models

    • #4423 - Make automl tests more deterministic by using max_models instead of max_runtime_secs

    • #4418 - Use readme as main dispatch for documentation

    • #4413 - Remove chache and unpersist call in SpreadRDDBuilder

    • #4411 - Switch to s3 cli on release pipelines

    • #4410 - Use withColumn instead of select in MOJO models

    • #4409 - Fix links to doc & scaladoc on nightly builds

    • #4407 - Upgrade H2O to 3.24.0.5

    • #4394 - Run only last build in jenkins

    • #4390 - Download page is missing one step on RSparkling tab -> library(rsparkling)

    • #4388 - Create maven repo on our s3 for each release and nightly

    • #4386 - Update DBC documentation with respoect to latest RSparkling development

v2.1.55, v2.2.41, v2.3.30, v2.4.12 (2019-06-03)

Downloads:

  • Bug

    • #4497 - Unify ratio param across pipeline api

    • #4470 - Use RPC endpoints to orchestrate cloud in internal mode

    • #4467 - Fix doc

    • #4457 - Fix class-loading for Sparkling Water assembly JAR in PySparkling

    • #4447 - Add numpy as PySparkling dependency ( it is required because of Spark but missing from list of dependencies)

    • #4446 - Warn that default value of convertUnknownCategoricalLevelsToNa will be changed to false on GridSearch & AutoML

    • #4442 - Fix wrong fat jar name

  • Task

    • #4465 - Benchmarks: Subproject Skeleton

  • Improvement

    • #4544 - Make sure python zip/wheel is downloadable from our release s3

    • #4483 - On download page -> list all supported minor versions

    • #4471 - Remove Param propagation of MOJOModels from Python to Java

    • #4469 - H2OCommonParams in pysparkling

    • #4468 - Move shared params to H2OCommonParams

    • #4460 - Don't use deprecated methods

    • #4459 - Warn user that default value of predictionCol on H2OMOJOModel will change in the next major release to 'prediction'

    • #4458 - Upgrade to H2O 3.24.0.4

    • #4454 - Definition of assembly jar via transitive exclusions

    • #4453 - Move ability to change behavior of MOJO models to MOJOLoader

    • #4452 - Move make-dist logic to gradle

    • #4451 - Expose binary model in spark pipeline stage

    • #4449 - Fix xgboost doc

    • #4445 - Rename the 'create_from_mojo' method of H2OMOJOModel and H2OMOJOPipelineModel to 'createFromMojo'

v2.1.54, v2.2.40, v2.3.29, v2.4.11 (2019-05-17)

Downloads:

  • Bug

    • #4500 - Fix constructor of H2OMojoModel

    • #4498 - Remove internal constructors & Deprecate implicit constructor parameters for H2O Algo Spark Estimators( to be the same as in PySparkling)

    • #4487 - Fix version check in PySpakrling shell

    • #4479 - Clean workspace on the hadoop node in integ tests

    • #4478 - Fix inconsistencies between H2OAutoML, H2OGridSearch & H2OALgorithm

    • #4476 - Fix bad representation of predictionCol on H2OMOJOModel

    • #4475 - XGBoost can't be used in H2OGridSearch pipeline wrapper

    • #4474 - Correctly return mojo model in pysparkling after fit

  • Story

    • #4486 - Remove SparkContext from H2OSchemaUtils

    • #4484 - Upgrade to H2O 3.24.0.3

  • New Feature

    • #4508 - getFeaturesCols() should not return the fold column or weight column

    • #4507 - probability calibration does not work in Sparkling Water Dataframe API

  • Improvement

    • #3964 - Override spark locality so we use only nodes on which h2o is running.

    • #4540 - Improve PySparkling README

    • #4495 - Remove binary H2O model from ML pipelines

    • #4494 - Don't require initializer call to be called during pysparkling pipelines

    • #4493 - Use default params reader in pipelines

    • #4489 - Non-named columns are long time deprecated. Switch to named columns by default

    • #4488 - Remove six as dependency from PySparkling launcher ( six is no longer dependency)

    • #4482 - Remove unnecessary constructor in helper class

    • #4477 - Add predictionCol to mojo pipeline model

v2.1.53, v2.2.39, v2.3.28, v2.4.10 (2019-04-26)

Downloads:

  • Bug

    • #4567 - Fix Sparkling Water 2.1.x compile on Scala 2.10

    • #4562 - RSparkling Can't be used on Spark 2.4

    • #4561 - Disable gradle daemon via gradle.properties

    • #4560 - Fix org.apache.spark.ml.spark.models.PipelinePredictionTest

    • #4553 - Custom metric not evaluated in internal mode of Sparkling Water

    • #4529 - Change get-extended-jar to use https instead of http

    • #4526 - Fix typo in GLM API - getRemoteCollinearColumns, setRemoteCollinearColumns

    • #4524 - Fix RUnits after upgrading to Gradle 5.3.1

    • #4522 - Deprecate asDataFrame with implicit argument

  • Story

    • #4558 - Introduce new annotation deprecating legacy methods in API

    • #4547 - Rename the 'predictionCol' model parameter to 'labelCol'

    • #4530 - Introduce mechanism for enabling backward compatibility of MOJO files when properties are renamed

  • New Feature

    • #4563 - Expose weights_column parameter

  • Improvement

    • #4568 - RSparkling: Add ability to add authentication details when calling h2o_context(sc)

    • #4566 - Improve hint description for disabling automatic usage of broadcast joins

    • #4557 - Improve memory efficiency of H2OMOJOPipelineModel

    • #4554 - Simplify Sparkling Water build

    • #4552 - Fix formating in python tests

    • #4548 - Create pysparkling tests report file if it does not exist

    • #4546 - Add fold column to python and scala pipelines

    • #4545 - Automatically download H2O Wheel

    • #4543 - Upgrade to H2O 3.24.0.2

    • #4542 - Remove PySparkling six dependency as it was removed in H2O

    • #4541 - Automatically generate PySparkling README

    • #4539 - Automatically generate last pieces of doc subproject

    • #4537 - Remove suport for testing external cluster in manual mode

    • #4535 - Remove unnecessary branch check

    • #4534 - Remove duplicate readme file (contains old info & the correct info is in doc)

    • #4533 - Remove confusing meetup dir

    • #4532 - Upgrade to Gradle 5.3.1

    • #4528 - Rename the 'ignoredColumns' parameter of H2OAutoML to 'ignoredCols'

    • #4527 - Remove dependencies to Scala 2.10

    • #4521 - Remove support for Python 2.6 on rel-2.1

    • #4520 - Reformat few python classes

    • #4518 - Parametrize EMR version in templates generation

    • #4517 - Remove old README and DEVEL doc files (not just pointer to new doc)

    • #4516 - Use minSupportedJava for source and target compatibility in build.gradle

v2.1.52, v2.2.38, v2.3.27, v2.4.9 (2019-04-03)

Downloads:

  • Bug

    • #4594 - Exception when there is a column with BOOLEAN type in dataset during H2OMOJOModel transformation

    • #4579 - In Pysparkling script, setting –driver-class-path influences the environment

    • #4578 - Upgrade to h2O 3.24.0.1

    • #4576 - Use specific metrics in grid search, in the same way as H2O Grid

    • #4575 - Document off heap memory configuration for Spark in Standalone mode/IBM conductor

    • #4574 - Fix random project name generation in H2OAutoML Spark Wrapper

  • New Feature

    • #4589 - Expose search_criteria for H2OGridSearch

    • #4582 - expose H2OGridSearch models

    • #4573 - Add includeAlgos to H2o AutoML pipeline stage & ability to ignore XGBoost

  • Improvement

    • #4592 - Add Sparkling Water to Jupyter spark/pyspark kernels in EMR terraform template

    • #4585 - Upgrade build to Gradle 5.2.1

    • #4581 - Integrate with H2O native hive support

v2.1.51, v2.2.37, v2.3.26, v2.4.8 (2019-03-15)

Downloads:

  • Bug

    • #4593 - Expose missing variables in shared TF EMR SW tamplate

  • Improvement

    • #4611 - Start jupyter notebook with Scala & Python Spark in AWS EMR Terraform template

    • #4591 - Upgrade to H2O 3.22.1.6

v2.1.50, v2.2.36, v2.3.25, v2.4.7 (2019-03-07)

Downloads:

  • Bug

    • #4606 - hc.stop() shows 'exit' not defined error

    • #4604 - Fix RSparkling in case the jars are being fetched from maven

    • #4600 - H2OXgboost pipeline stage does not define updateH2OParams method

    • #4597 - Unique project name in automl to avoid sharing one leaderboard

    • #4595 - Fix grid search pipeline step on pyspark side

  • Improvement

    • #4703 - Document teraform scripts for AWS

    • #4666 - Document using Google Cloud Storage In Sparkling Water

    • #4621 - Speed up conversion between sparse spark vectors and h2o frames by using sparse new chunk

    • #4615 - Improve terraform templates for AWS EMR and make them part of the release process

    • #4607 - Allow login via ssh to created cluster using terraform

    • #4603 - Add H2OGridSearch pipeline stage to PySpark

    • #4601 - Test GBM Grid Search Scala pipeline step

    • #4598 - Generalize H2OGridSearch Pipeline step to support other available algos

    • #4596 - Upgrade to H2O 3.22.1.5

v2.1.49, v2.2.35, v2.3.24, v2.4.6 (2019-02-18)

Downloads:

  • Bug

    • #4620 - Fix bug affecting loading pipeline in python when stored in scala

    • #4618 - Fix several cases in spark vector -> h2o conversion

  • Improvement

    • #4622 - Add H2OGLM Wrapper to Sparkling Water

    • #4617 - Update mojo2 to 0.3.16

    • #4613 - Fix s3 bootstrap templates for nightly builds

    • #4612 - Upgrade to H2O 3.22.1.4

v2.1.47, v2.2.33, v2.3.22, v2.4.4 (2019-01-21)

Downloads:

  • Bug

    • #4627 - Fix support for unsupervised mojo models

  • Improvement

    • #4655 - Update code to work with latest jetty changes

    • #4629 - Upgrade H2O to 3.22.1.2

v2.1.46, v2.2.32, v2.3.21, v2.4.3 (2019-01-17)

Downloads:

  • Bug

    • #4640 - Cannot serialize DAI model

  • Improvement

    • #4643 - Update to H2O 3.22.0.5

    • #4641 - Enable tabs in the documentation based on the language

    • #4636 - Prepare Terraform scripts for Sparkling Water on EMR

    • #4635 - Use getTimestamp method instead of _timestamp directly

v2.1.45, v2.2.31, v2.3.20, v2.4.2 (2019-01-08)

Downloads:

  • Bug

    • #4649 - NullPointerException at water.H2ONode.openChan(H2ONode.java:417) after upgrade to H2O 3.22.0.3

    • #4646 - Fix test suite to test PySparkling YARN integration tests on external backend as well

  • Task

    • #4647 - Docs: Change copyright year in docs to include 2019

  • Improvement

    • #4061 - Publish PySparkling as conda package

    • #4645 - Update H2O to 3.22.0.4

v2.1.44, v2.2.30, v2.3.19, v2.4.1 (2018-12-27)

Downloads:

  • Bug

    • #4671 - Documentation link does not work on the Nightly Bleeding Edge download page

    • #4656 - Fix Travis builds

    • #4654 - Fix Travis builds (test just scala unit tests)

  • Improvement

    • #4061 - Publish PySparkling as conda package

    • #4675 - Fix deprecation warning regarding automl -> AutoML

    • #4663 - Updates to streaming app

    • #4662 - Update to H2O 3.22.0.3

    • #4661 - Upgrade gradle to 4.10.3

    • #4660 - Enable GCS in Sparkling Water

    • #4659 - Properly integrate GCS with Sparkling Water, including test in PySparkling

  • Docs

    • #4672 - Add Installation and Starting instructions to the docs

v2.1.42, v2.2.28, v2.3.17 (2018-10-27)

Downloads:

  • Bug

    • #4684 - Fallback to original IP discovery in case we can't find the same network

    • #4683 - Fix handling time column for mojo pipeline

    • #4682 - Upgrade MOJO to 0.3.17

  • Improvement

    • #4710 - Upgrade H2O to 3.22.0.1

v2.1.41, v2.2.27, v2.3.16 (2018-10-17)

Downloads:

  • Bug

    • #4824 - Enable AutoML tests in Sparkling Water

    • #4690 - Fix isssue with empty queue name by default

    • #4689 - In PySparkling, don't reconnect if already connected

    • #4687 - Fix warning in doc

  • Improvement

    • #4698 - Sparkling shell ignores parameters after last updates

    • #4697 - Automatic detection of client ip in external backend

    • #4696 - Pysparkling in external backend, manual mode stops the backend cluster, but the cluster should be left intact

    • #4695 - Create nightly release for 2.1, 2.2 and 2.3

    • #4694 - Upgrade to Mojo 0.3.15

    • #4693 - Don't expose mojo internal types

    • #4692 - More explicit checks for valid values of Backend mode and external backend start mode

    • #4691 - Expose run_as_user for External H2O Backend

    • #4686 - Upgrade H2O to 3.20.0.10

v2.1.40, v2.2.26, v2.3.15 (2018-10-02)

Downloads:

  • Bug

    • #4714 - Fix passing –jars to sparkling-shell

    • #4713 - More robust check for python package in PySparkling shell

    • #4707 - Add missing six dependency to setup.py for PySparkling

  • Improvement

    • #4712 - Mojo pipeline with multiple output columns (and also with dots in the names) does not work in SW

    • #4701 - Upgrade H2O dependency to 3.20.0.9

v2.1.39, v2.2.25, v2.3.14 (2018-09-24)

Downloads:

  • New Feature

    • #4735 - Expose leaderboard on H2OAutoML

    • #4733 - Display Release creation date on the download page

  • Improvement

    • #4731 - remove call to ./gradlew –help in jenkins pipeline

    • #4730 - Ensure that release does not depend on build id

    • #4729 - Automatically update master after RSparkling release with latest version

    • #4725 - [RSparkling] In case only path to SW jar file is specified, discover the version from JAR file instead of requiring it as parameter

    • #4724 - Enable installation ot RSparkling using devtools from Github repo

    • #4723 - Upgrade mojo pipeline to 0.13.2

    • #4722 - Document automatic certificate creation for Flow UI

    • #4721 - PySparkling fails if we specify https argument as part of getOrCreate()

    • #4720 - Document using s3a and s3n on Sparkling Water

    • #4719 - Upgrade to H2O 3.20.0.8

    • #4717 - The shell script bin/pysparkling should print missing dependencies

    • #4716 - Upgrade Gradle to 4.10.2

  • Docs

    • #4737 - Fix link to Installing RSparkling on Windows

v2.1.38, v2.2.24, v2.3.13 (2018-09-14)

Downloads:

  • New Feature

    • #4732 - Upgrade Gradle to 4.10.1

  • Improvement

    • #4736 - Upgrade H2O to 3.20.0.7

    • #4728 - Revert Upgrade to Gradle 4.10.1(bug in Gradle) and upgrade to Gradle 4.0

    • #4727 - Update docs and mention that ORC is supported

  • Docs

    • #4738 - Docs: Add Parquet to list of supported data formats

v2.1.37, v2.2.23, v2.3.12 (2018-08-28)

Downloads:

  • Bug

    • #5073 - Add test for RDD[TimeStamp] -> H2OFrame[Time] -> RDD[Timestamp] conversion

    • #5026 - SVMModelTest is failing

    • #4769 - Fix links on RSparkling Readme page

    • #4759 - Fix typos in documentation

    • #4758 - Fix javadoc on JavaH2OContext

    • #4755 - Setting context path in pysparkling fails to launch h2o

    • #4754 - RSparkling does not respect context path

    • #4753 - Automatically generate the keystore for H2O Flow ssl (self-signed certificates)

    • #4752 - When running in Local mode, we ignore some configuration

    • #4751 - Fix context path value checks

    • #4750 - Use correct scheme in sparkling water when ssl on flow is enabled

    • #4749 - Fix context path setting on RSparkling

    • #4740 - Add context path after value of spark.ext.h2o.client.flow.baseurl.override when specified

  • New Feature

    • #4774 - Integrate XGBoost in Sparkling Water

    • #4743 - Sparkling water External Backend Support in kerberized cluster

  • Task

    • #4767 - Add to docs that pysparkling has a new dependency pyspark

  • Improvement

    • #5168 - JavaH2OContext#asRDD implementation is missing

    • #4834 - Sparkling Water/RSparkling needs to declare additional repository

    • #4766 - Improve Scala Doc API of the support classes

    • #4764 - Update Gradle Spinx libraries - faster documentation builds

    • #4763 - Create abstract class from creating parameters from Enum for Sparkling Water pipelines

    • #4762 - [PySparkling] Fix Wrong H2O version detection on latest bundled H2Os

    • #4761 - Add timeouts & retries for docker pull

    • #4757 - Document using PySparkling on the edge node ( EMR)

    • #4748 - Upgrade H2O to 3.20.0.6

    • #4744 - Fix EMR bootstrap scripts

    • #4742 - Add option which can be used to change the flow address which is printed out after H2OConetext started

    • #4741 - Document how to run Sparkling Water on kerberized cluster

v2.1.36, v2.2.22, v2.3.11 (2018-08-09)

Downloads:

  • Bug

    • #4783 - Change maintainer of RSparkling to jakub@h2o.ai

    • #4782 - Fix Content of RSparkling release table

    • #4781 - Allow passing custom cars when running ./bin/sparkling/shell

    • #4779 - Fix CRAN issues of Rsparkling

    • #4773 - Fix wrong comparison of versions when detecing other h2o versions in PySparkling

    • #4772 - Set up client_disconnect_timeout correctly in context on External backend, auto mode

    • #4771 - Fix missing mojo impl artifact when running pysparkling tests in jenkins

  • Task

    • #4079 - Add to doc that 100 columns are displayed in the preview data by default

  • Improvement

    • #4000 - Update PySparkling Notebooks to work for Python 3

    • #4164 - List nodes and driver memory in Spark UI - SParkling Water Tab

    • #4844 - Use Mojo Pipeline API in Sparkling Water

    • #4785 - Port documentation for mojo pipeline on Spark to SW repo

    • #4784 - Upgrade Mojo 2 in SW to 0.11.0

    • #4778 - Upgrade H2O to 3.20.0.5

    • #4777 - Need ability to disable Flow UI for Sparkling-Water

    • #4775 - Verify that we are running on correct Spark for PySparkling at init time

    • #4770 - Cache also test and runtime dependencies in docker image

  • Docs

    • #4808 - Add "How to" for using Sparkling Water on Google Cloud Dataproc

v2.1.35, v2.2.21, v2.3.10 (2018-08-01)

Downloads:

  • Bug

    • #4851 - Automate releases of RSparkling and create release pipeline for this release proccess

    • #4843 - Add missing repository to the documentation

    • #4810 - Fix Sphinx gradle plugin, the latest version does not work

    • #4809 - Stabilize releasing to Nexus Repository

    • #4801 - Do not stop external H2O backend in case of manual start mode

    • #4796 - Fix RSparkling README style issues

    • #4795 - Fix address for fetching H2O R package in nightly tests

    • #4793 - Add option to ignore SPARK_PUBLIC_DNS

    • #4792 - Add option which ensures that items in flatfile are translated to IP address

    • #4787 - Deprecate old behaviour of mojo pipeline output in SW

  • Improvement

    • #5110 - Warn if user's h2o in python env is different then the one bundled in pysparkling

    • #4833 - Move Rsparkling to Sparkling Water repo

    • #4813 - Upgrade Gradle to 4.9

    • #4802 - Fix issues when stopping Sparkling Water (Scala) in yarn-cluster mode for external Backend

    • #4797 - RSparkling should run tests in both, external and internal mode

    • #4791 - Upgrade H2O to 3.20.0.4

    • #4789 - Expose port offset in Sparkling Water

    • #4786 - Remove confusing message about stopping H2OContext in PySparkling

v2.1.34, v2.2.20, v2.3.9 (2018-07-16)

Downloads:

  • Bug

    • #4852 - Upgrade Gradle to 4.8.1

    • #4850 - Upgrade Mojo2 version to 0.10.7

    • #4845 - Fix issues when stopping Sparkling Water (Scala) in yarn-cluster mode

    • #4829 - Fix missing aposthrope in documentation

    • #4825 - Disable temporarily AutoML tests in Sparkling Water

  • New Feature

    • #4926 - Implement Synchronous and Asynchronous Scala cell behaviour

  • Improvement

    • #4907 - Don't parse types again when passing data to mojo pipeline

    • #4868 - Several Scala cell improvements in H2O flow

    • #4867 - Make sure that we can use schemes unsupported by H2O in H2O Confoguration

    • #4865 - Port AWS preparation scripts into SW codebase

    • #4860 - Add support for queuing of Scala cell jobs

    • #4840 - Wrong Spark version in documentation

    • #4839 - Upgrade to Spark 2.1.3

    • #4837 - Dockerize Sparkling Water release pipeline

    • #4835 - Clean gradle build with regards to mojo2

    • #4832 - Upgrade H2O to 3.20.0.3

    • #4826 - Expose AutoML max models

  • Docs

    • #4876 - Add section for using Sparkling Water with AWS

v2.1.32, v2.2.18, v2.3.7 (2018-06-18)

Downloads:

  • Bug

    • #4893 - Upgrade Gradle to 4.8 (publishing plugin)

    • #4882 - Fix reference to local-cluster on download page

    • #4874 - Update Hadoop version on download page

    • #4873 - Fix Script tests on Dockerized Jenkins infrastructure

    • #4872 - Call h2oContext.stop after ham or spam Scala example

    • #4871 - Add mising description in publish.gradle

  • Improvement

    • #4894 - Modify the hadoop launch command on download page

    • #4881 - Upgrade H2O to 3.20.0.1

    • #4880 - Update Mojo2 to 0.10.4

    • #4875 - Print output of script tests

v2.1.31, v2.2.17, v2.3.6 (2018-06-13)

Downloads:

  • Bug

    • #4903 - Expose methods to get input/output names in H2OMOJOPipelineModel

    • #4895 - Print Warning when spark-home is defined on PATH

    • #4892 - Create & fix test in PySparkling for named mojo columns

    • #4890 - Fix & more readable test

    • #4889 - Better Naming of the UDF method to obtain predictions

    • #4885 - Add repository to build required by xgboost-predictor

  • Story

    • #4898 - Upgrade Mojo2 to latest version

  • Improvement

    • #4914 - Verify that Spark time column representation can be digested by Mojo2

    • #4905 - Document Kerberos on Sparkling Water

    • #4904 - Update use from maven on sparkling water download page

    • #4902 - Make use of output types when creating Spark DataFrame out of mojo2 predicted values

    • #4901 - Create spark UDF used to extract predicted values

    • #4900 - Sparkling Water py should require pyspark dependency

    • #5615 - Upgrade MojoPipeline to 0.10.0

    • #4899 - Upgrade H2O to 3.18.0.11

v2.1.30, v2.2.16, v2.3.5 (2018-05-23)

Downloads:

  • Bug

    • #4911 - Enforce system level properties in SW

  • Improvement

    • #4908 - Upgrade H2O to 3.18.0.10

    • #4906 - Remove GA from Sparkling Water

v2.1.29, v2.2.15, v2.3.4 (2018-05-18)

Downloads:

  • Bug

    • #4917 - Add support for converting empty dataframe/RDD in Python and Scala to H2OFrame

    • #4912 - Remove withCustomCommitsState in pipelines as it's now duplicating Github

    • #4910 - Fix data obtaining for mojo pipeline

    • #4909 - Upgrade Mojo pipeline to 0.9.9

v2.1.28, v2.2.14, v2.3.3 (2018-05-15)

Downloads:

  • Bug

    • #4935 - Enable running MOJO spark pipeline without H2O init

    • #4927 - Local creation of Sparkling Water does not work anymore.

    • #4173 - Check shape of H2O frame after the conversion from Spark frame

    • #4919 - External Backend stored sparse vector values incorrectly

  • Improvement

    • #4923 - Type checking in PySparkling pipelines

    • #4921 - Small refactoring in identifiers

    • #4920 - Explicitly set source and target java versions

    • #4916 - Upgrade H2O to 3.18.0.9

    • #4915 - Upgrade Mojo pipeline dependency to 0.9.8

    • #4913 - Add test checking column names and types between spark and mojo2

v2.1.27, v2.2.13, v2.3.2 (2018-05-02)

Downloads:

  • Bug

    • #4138 - Process steam handle and use it for connection to external h2o cluster

    • #4930 - Require correct colorama version

    • #4929 - Fix Windows starting scripts

    • #4928 - Fix NPE in mojo pipeline predictions

  • New Feature

    • #4925 - Change color highlight in scala cell as it is too dark

  • Improvement

    • #4937 - Upgrade H2O to 3.18.0.8

    • #4936 - Update Mojo2 dependency to one which is compatible with Java7

    • #4934 - Spark Pipeline imports do not work in PySparkling

    • #4933 - Add ability to convert specific columns to categoricals in Sparkling Water pipelines

    • #4932 - Sparkling Water pipelines add duplicate response column to the list of features

v2.1.26, v2.2.12, v2.3.1 (2018-04-19)

Downloads:

  • Bug

    • #4248 - Enable using sparkling water maven packages in databricks cloud

    • #4963 - Documentation fixes

    • #4960 - Add missing seed argument to H2OAutoml pipeline step

    • #4956 - Point to proper web-based docs

    • #4954 - Use parquet provided by Spark

    • #4953 - Automatically update redirect table as part of release pipeline

    • #4946 - Fix exporting and importing of pipeline steps and mojo models to and from HDFS

  • Improvement

    • #4977 - Integrate & Test Mojo Pipeline with Sparkling Water

    • #4961 - Upgrade H2O to 3.18.0.7

    • #4959 - Expose context_path in Sparkling Water

    • #4957 - Create additional test verifying that the new light endpoint works as expected

    • #4952 - Additional link to documentation

    • #4950 - Remove references to Sparkling Water 2.0

    • #4948 - Reduce time of H2OAutoml step in pipeline tests to 1 minute

    • #4944 - Upgrade to Gradle 4.7

v2.1.25, v2.2.11, v2.3.0 (2018-03-29)

Downloads:

  • Bug

    • #4224 - Intermittent script test issue on external backend

    • #4194 - Mark Spark dependencies as provided on artefacts published to maven

    • #4181 - Increase timeout for conversion in pyunit test for external cluster

    • #4989 - Fix doc artefact publication

    • #4986 - Remove support for downloading H2O logs from Spark UI

    • #4983 - Fix coding style issue

    • #4980 - Fix import

    • #4973 - sparkling water from maven does not know the stacktrace_collector_interval option

    • #4972 - Handle nulls properly in H2OMojoModel

  • New Feature

    • #4198 - [PySparkling] Check for correct data type as part of as_h2o_frame

  • Improvement

    • #4188 - Parametrize pipeline scripts to be able to specify different algorithms

    • #4175 - Log chunk layout after the conversion of data to external H2O cluster

    • #4994 - Document GBM Grid Search Pipeline Step

    • #4984 - Remove test artefacts from the sparkling-water assembly

    • #4981 - Add missing import

    • #4976 - Don't use default value for output dir in external backend, it's not required

    • #4970 - Upgrade H2O to 3.18.0.5

  • Docs

    • #4974 - Fix link for documentation on DEVEL.md

v2.1.24, v2.2.10 (2018-03-08)

Downloads:

  • Bug

    • #4182 - Sparkling Water Doc artefact is still missing Scala version

    • #4179 - Fix setting up node network mask on external cluster

    • #4178 - Allow to set LDAP and different security options in external backend as well

    • #4174 - Fix bug in documentation for manual mode of external backend

    • #4992 - Fix tests after enabling the stack-trace collection

  • Improvement

    • #4177 - Document how to use Sparkling Water with LDAP in Sparkling Water docs

    • #4176 - Expose Grid search as Spark pipeline step in the Scala API

    • #5001 - Upgrade to Gradle 4.6

    • #4997 - Collect stack traces on each h2o node as part of log collecting extension

    • #4995 - Upgrade H2O to 3.18.0.3

    • #4993 - Upgrade H2O to 3.18.0.4

  • Docs

    • #4996 - Add "How to" for changing the default H2O port

v2.1.23, v2.2.9 (2018-02-26)

Downloads:

  • Bug

    • #4197 - Sparkling water doc artefact is missing scala version

    • #4193 - Improve method for downloading H2O logs

    • #4192 - Use new light endpoint introduced in 3.18.0.1

    • #4187 - Make sure we use the unique key names in split method

    • #4185 - Document how to download logs on Databricks cluster

    • #4184 - Expose downloadH2OLogs on H2OContext in PySparkling

    • #4183 - Move spark.ext.h2o.node.network.mask setter to SharedArguments

  • Improvement

    • #4218 - Create Spark Transformer for AutoML

    • #4195 - create an an equvivalent of h2o.download_all_logs in scala

    • #4190 - Upgrade H2O to 3.18.0.2

v2.1.22, v2.2.8 (2018-02-14)

Downloads:

  • Technical task

    • #4268 - Deliver SW documentation in HTML output

  • Bug

    • #4235 - Fix Typo in documentation

    • #4225 - Make printHadoopDistributions gradle task available again for testing

    • #4219 - Kill the client when one of the h2o nodes went OOM in external mode

    • #4214 - Fix pysparkling.ml import for non-interactive sessions

    • #4213 - parquet import fails on HDP with Spark 2.0 (azure hdi cluster)

    • #4212 - Make sure H2OMojoModel does not required H2OContext to be initialized

    • #4211 - Fix mojo predictions tests

    • #4210 - In PySparkling pipelines, ensure that if users pass integer to double type we handle that correctly for all possible double values

    • #4207 - Write a simple test for parquet import in Sparkling Water

    • #4206 - Add option to H2OModel pipeline step allowing us to convert unknown categoricals to NAs

    • #4205 - Fix driverif configuration on the external backend

  • Improvement

    • #4106 - Verify & Document run of RSparkling on top of Databricks Azure cluster

    • #4242 - Document how to change log location

    • #4237 - H2OContext can't be initalized on Databricks cloud

    • #4234 - Fix typo in documentation

    • #4233 - Upgrade Gradle to 4.5

    • #4232 - Update docs - SparklyR supports Spark 2.2.1 in the latest release

    • #4230 - Log Sparkling Water version during startup of Sparkling Water

    • #4227 - Allow to set driverIf on external H2O backend

    • #4226 - Fix creation of Extended JAR in gradle task

    • #4220 - Report Yarn App ID of spark application in H2OContext

    • #4217 - Upload generated sphinx documentation to S3

    • #4216 - Update links on the download page to point to the new docs

    • #4215 - Increase memory for JUNIT tests

    • #4202 - Upgrade to Gradle 4.5.1

    • #4201 - Upgrade to H2O 3.18.0.1

    • #4200 - Fix parquet import test on external backend

  • Docs

    • #4223 - Final updates for Sparkling Water html output

    • #4222 - Update "Contributing" section in Sparkling Water

v2.1.21, v2.2.7 (2018-01-18)

Downloads:

  • Bug

    • #5070 - Remove workaround introduced by #5070 for yarn/cluster mode

    • #4161 - Remove hotfix introduced by [#4161] and implement proper fix

    • #4258 - Remove extra files that got into the repo

    • #4254 - Kill the cluster when a new executors joins in the internal backend

    • #4252 - Generate download link as part of the release notes

    • #4251 - Remove mentions of local-cluster in public docs

    • #4250 - Deprecated call in H2OContextInitDemo

    • #4249 - Fix jenkinsfile for builds again specific h2o branches

  • Improvement

    • #4246 - Update H2O to 3.16.0.4

    • #4245 - Tiny clean up of the release code

    • #4241 - Cleaner release script

    • #4240 - Ensure S3 in release pipeline does depend only on credentials provided from Jenkins

    • #4239 - Separate releasing on Github and Publishing artifacts

v2.1.20, v2.2.6 (2018-01-03)

Downloads:

  • Bug

    • #4085 - [PySparkling] calling as_spark_frame for the second time results in exception

    • #4082 - Fix ham or spam flow to reflect latest changes in pipelines

    • #4081 - Ensure that we do not access RDDs in pipelines ( to unblock the deployment)

    • #4274 - Fix incosistencies in ham or spam examples between scala and python

    • #4272 - Fix ham or spam pipeline tests

    • #4271 - Fix ham or spam tests for deeplearning pipeline

    • #4259 - Use always correct Spark version on the R download page

  • Improvement

    • #4104 - Measure time of conversions to H2OFrame in debug mode

    • #4100 - Port all arguments available to Scala ML to PySparkling ML

    • #4095 - Support for exporting mojo to hdfs

    • #4080 - Dump full spark configuration during H2OContext.getOrCreate into DEBUG

    • #4077 - Fix wrong instruction at PySparkling download page

    • #4283 - Create new DataFrame with new schema when it actually contain any dot in names

    • #4282 - Port release script into the sw repo

    • #4281 - Use persist layer for exportPOJOModel

    • #4280 - export H2OMOJOMOdel.createFromMOJO to pysparkling

    • #4278 - Create test for mojo predictions in PySparkling

    • #4277 - Add tests for H2ODeeplearning in Scala and Python and Fix potential problems

    • #4276 - Log spark configuration to INFO level

    • #4270 - Upgrade Gradle to 4.4.1

    • #4264 - Upgrade ShadowJar to 2.0.2

v2.1.19, v2.2.5 (2017-12-11)

Downloads:

  • Bug

    • #4097 - pysparkling.__version__ returns incorrectly ‘SUBST_PROJECT_VERSION’

    • #4096 - PySparkling fails on python 3.6 because long time does not exist in python 3.6

    • #4091 - PySParkling failing on Python3.6

    • #4088 - Python build does not support H2O_PYTHON_WHEEL when building against h2o older then 3.16.0.1

    • #4084 - PySparkling fails when installed from pypi

  • Improvement

    • #4086 - Upgrade Gradle to 4.4

v2.1.18, v2.2.4 (2017-12-01)

Downloads:

  • Bug

    • #4110 - conversion of sparse data DataFrame to H2OFrame is slow

    • #4092 - Fix obtaining version from bundled h2o inside pysparkling

  • Improvement

    • #4099 - Append dynamic allocation option into SW tuning documentation.

    • #4094 - Integration with H2O 3.16.0.2

v2.1.17, v2.2.3 (2017-11-25)

Downloads:

  • Bug

    • #5025 - H2OConfTest Python test blocks test run

    • #4027 - BinaryType handling is not implemented in SparkDataFrameConverter

    • #3993 - asH2OFrame gives error if column names have DOT in it

    • #4165 - Don’t use md5skip in external mode

    • #4143 - pysparkling: h2o on exit does not shut down cleanly

    • #4140 - Additional fix for [#4140]

    • #4139 - Minor Gradle build improvements and fixes

    • #4137 - Incorrect comment in hamOrSpamMojo pipeline

    • #4136 - Cleanup pysparkling test infrastructure

    • #4135 - Fix conditions in jenkins file

    • #4132 - Fix composite build in Jenkins

    • #4131 - Fix H2OConf test on external cluster

    • #4130 - Opening Chicago Crime Demo Notebook errors on the first opening

    • #4128 - Create extended directory automatically

    • #4124 - Fix links in README

    • #4123 - Wrap stages in try finally in jenkins file

    • #4120 - Properly pass all parameters to algorithm

    • #4119 - H2Conf cannot be initialized on windows

    • #4118 - Gradle ml submodule reports success even though tests fail

    • #4117 - Fix ML tests

  • New Feature

    • #4007 - Introduce SW Models into Spark python pipelines

  • Task

    • #4103 - Upgrade H2O dependency to 3.16.0.1

  • Improvement

    • #5027 - Keep H2O version inside sparklin-water-core.jar and provide utility to query it

    • #3914 - Shell scripts miss-leading error message

    • #4022 - Provides Sparkling Water Spark Uber package which can be used in –packages

    • #4142 - Stop previous jobs in jenkins in case of PR

    • #4141 - In PySparkling, getOrCreate(spark) still incorrectly complains that we should use spark session

    • #4129 - Upgrade to Gradle 4.3

    • #4127 - Add the custom commit status for internal and external pipelines

    • #4126 - [ML] Remove some duplicities, enable mojo for deep learning

    • #4122 - Replace deprecated method call in ChicagoCrime python example

    • #4121 - Repl doesn’t require H2O dependencies to compile

    • #4116 - Minor build improvements

    • #4109 - Upgrade Gradle to 4.3.1

    • #4107 - addFiles doesn’t accept sparkSession

    • #4102 - Change default client mode to INFO, let user to change it at runtime

v2.1.16, v2.2.2 (2017-10-23)

Downloads:

  • Bug

    • #4157 - Fix documentation issue in PySparkling

    • #4154 - Increase default value for client connection retry timeout in

    • #4152 - SW documentation for nthreads is inconsistent with code

    • #4151 - Fix reporting artefacts in Jenkins and remove use of h2o-3-shared-lib

    • #4148 - Clean test workspace in jenkins

    • #4147 - Fix creation of extended jar in jenkins

    • #4145 - Fix failing tests on external backend

    • #4144 - Remove obsolete and failing idea configuration

    • #4153 - GLM fails to build model when weights are specified

  • Improvement

    • #4155 - Create 2 jenkins files ( for internal and external backend ) backed by configurable pipeline

    • #4150 - Disable web on external H2O nodes in external cluster mode

    • #4149 - In external cluster mode, print also YARN job ID of the external cluster once context is available

    • #4146 - Upgrade H2O to 3.14.0.7

    • #4159 - Improve handling of sparse vectors in internal cluster

v2.1.15, v2.2.1 (2017-10-10)

Downloads:

  • Bug

    • #3911 - Tests of External Cluster mode fails

    • #4010 - External cluster improperly convert RDD[ml.linalg.Vector]

    • #4003 - Don’t use GPU nodes for sparkling water testing in Jenkins

    • #4002 - Add missing when clause to scripts test stage in Jenkinsfile

    • #4001 - Use dX cluster for Jenkins testing

    • #3999 - Code defect in Scala example

    • #3997 - Use code which is compatible between Scala 2.10 and 2.11

    • #3996 - Make auto mode in external cluster default for tests in jenkins

    • #3994 - Ensure that all tests run on both, internal and external backends

    • #3992 - Allow to test sparkling water against specific h2o branch

    • #3991 - Update Gradle to 4.2RC2

    • #3990 - Fix problem in Jenkinsfile where H2O_HOME has higher priority then H2O_PYTHON_WHEEL

    • #3989 - Fix PySparkling issue when running multiple times on the same node

    • #4171 - Model training hangs in SW

    • #4170 - sw does not support parquet import

    • #4160 - Fix documentation bug

  • New Feature

    • #4006 - Fix typo in documentation

    • #4005 - Use linux label to determine which nodes are used for Jenkins testing

    • #3995 - In external cluster, remove notification file at the end. This affects nothing, it is just cleanup.

  • Improvement

    • #4169 - Upgrade Gradle to 4.2

    • #4168 - Improve exception in ExternalH2OBackend

    • #4167 - Stop H2O in afterAll in tests

    • #4166 - Add sw version to name of h2odriver obtained using get-extended-h2o script

    • #4163 - Upgrade gradle to 4.2.1

    • #4162 - Upgrade H2O to 3.14.0.6

v2.1.14, v2.2.0 (2017-08-23)

Downloads:

  • Bug

    • #4076 - Support Sparse Data during spark-h2o conversions

    • #4016 - The link Demo Example from Git is broken on the download page

  • New Feature

    • #4044 - MOJO for Spark SVM

  • Improvement

    • #4012 - Upgrade H2O to 3.14.0.2

    • #3939 - bin/sparkling-shell should fail if assembly jar file does not exist

    • #4054 - Use mojo in pipelines if possible, remove H2OPipeline and OneTimeTransformers

    • #4014 - Make JenkinsFile up-to-date with sparkling_yarn_branch

    • #4013 - Upgrade to Gradle 4.1

v2.1.13 (2017-08-02)

Downloads:

  • Bug

    • #4025 - Security Bug when using Security.enableSSL(spark)

    • #4021 - Travis build is failing on missing OracleJdk7

  • Improvement

    • #3978 - Include H2O R client distribution in Sparkling Water binary

    • #4026 - Warehouse dir does not have to be set in tests on Spark from 2.1+

    • #4020 - Documentation for the backends should mention get-extended-h2o.sh instead of manual jar extending

    • #4019 - Upgrade to Gradle 4.0.2

    • #4018 - More robust get-extended-h2o.sh

    • #4017 - Add back DEVEL.md and CHANGELOG.md and redirect to new versions

v2.1.12 (2017-07-17)

Downloads:

  • Improvement

    • #4036 - Upgrade Gradle to 4.0.1

    • #4035 - Increase default value for Write and Read confirmation timeout

    • #4034 - Remove dead code and deprecation warning in tests

    • #4033 - Enforce Scala Style rules

    • #4032 - Remove hard dependency on RequestServer by using RestApiContext

    • #4030 - Remove ignored empty “H2OFrame[Time] to DataFrame[TimeStamp]” test

    • #4028 - Upgrade H2O to 3.10.5.4

v2.1.11 (2017-07-12)

Downloads:

  • Bug

    • #3927 - Make scala H2OConf consistent and allow to set and get all propertties

  • Improvement

    • #4040 - Update instructions for a new PYPI.org

    • #4037 - Upgrade H2O to 3.10.5.3

v2.1.10 (2017-06-29)

Downloads:

  • Bug

    • #4056 - Remove accidentally added kerb.conf file

    • #4055 - Allow to pask sparkSession to Security.enableSSL and deprecate sparkContext

    • #4051 - Use deprecated HTTPClient as some CDH versions does not have the new method

    • #4050 - Handle duke library in case it’s loaded using –packages

    • #4046 - Fix CHANGELOG location in make-dist.sh

  • Improvement

    • #4068 - Clean up windows scripts

    • #4059 - Separate Devel.md into multiple rst files

    • #4053 - Convert to rst README in gradle dir

    • #4052 - Upgrade to gradle 4.0

    • #4048 - Upgrade H2O to 3.10.5.2

    • #4045 - Bring back publishToMavenLocal task

    • #4043 - Updates to change log location

    • #4042 - Make rel-2.1 changelog consistent and also rst

v2.1.9 (2017-06-15)

Downloads:

  • Technical task

    • #5132 - In PySparkling for spark 2.0 document how to build the package

  • Bug

    • #3886 - Add missing jar into the assembly

    • #4075 - Fix instructions on the download site

    • #4072 - Use size method to get attr num

    • #4071 - Replace sparkSession with spark in backends documentation

    • #4069 - Make shell scripts safe

    • #4066 - Update PySparkling run-time dependencies

    • #4064 - Fix wrong getters and setters in pysparkling

    • #4058 - Fix typo in the FAQ documentation

    • #4057 - Fix make-dist

  • New Feature

    • #4070 - Replace the remaining references to egg files

  • Improvement

    • #5315 - Append tab on Sparkling Water download page - how to use Sparkling Water package

    • #5230 - Update FAQ with information about hive metastore location

    • #5229 - Sparkling Water Tunning doc: add heartbeat dcoumentation

    • #5034 - Please report Application Type to Yarn Resource Manager

    • #5005 - Improve structure of SW README

    • #3908 - Allow to download sparkling water logs from the spark UI

    • #3890 - Remove references to Spark 1.5, 1.4 ( as it’s old ) in README.rst and other docs

    • #3887 - Upgrade H2O to 3.10.5.1

    • #4073 - Add missing spaces after “,” in H2OContextImplicits

    • #4065 - Allow to configure flow dir location in SW

    • #4062 - Extract sparkling water configuration to extra doc in rst format

    • #4060 - Mark tensorflow demo as experimental

v2.1.8 (2017-05-25)

Downloads:

  • Bug

    • #5080 - Cannot run build in parallel because of Python module

    • #5009 - Wrong documentation of PyPi h2o_pysparkling_2.0 package

    • #3904 - pysparkling: adding a column to a data frame does not work when parse the original frame in spark

    • #3903 - Allow to pass additional arguments to run-python-script.sh

    • #3898 - Fix getting of sparkling water jar in pysparkling

    • #3897 - Don’t call atexit in case of pysparkling in cluster deploy mode

    • #3896 - store h2o logs int unique directories

    • #3895 - handle interrupted exception in H2ORuntimeInfoUIThread

    • #5010 - Cannot install pysparkling from PyPi

  • Improvement

    • #3889 - Remove information from README.pst that pip cannot be used

    • #5004 - Support Python 3 distribution

    • #3954 - Define Jenkins pipeline via Jenkinsfile

    • #3901 - Add change logs link to the sw download page

    • #3899 - Upgrade shadow jar plugin to 2.0.0

    • #3894 - Sparkling Water cluster name should contain spark app id instead of random number

    • #3893 - Replace deprecated DefaultHTTPClient in AnnouncementService

    • #3892 - Get array size from metadata in case of ml.lilang.VectorUDT

    • #3891 - Upgrade H2O version to 3.10.4.8

v2.1.7 (2017-05-10)

Downloads:

  • Bug

    • #3905 - Different cluster name between client and h2o nodes in case of external cluster

v2.1.6 (2017-05-09)

Downloads:

  • Improvement

    • #3910 - Add SW tab in Spark History Server

    • #3907 - Upgrade H2O dependency to 3.10.4.7

v2.1.5 (2017-04-27)

Downloads:

  • Bug

    • #3913 - Externar cluster: Job is reporting exit status as FAILED even all mappers return 0

  • Improvement

    • #3912 - Upgrade H2O dependency to 3.10.4.6

v2.1.4 (2017-04-20)

Downloads:

  • Bug

    • #5276 - Add pysparkling instruction to download page

    • #3968 - Properexit status handling of external cluster

    • #3936 - Usetimeout for read/write confirmation in external cluster mode

    • #3934 - Fix stopping of H2OContext in case of running standalone application

    • #3933 - Add configuration property to external backend allowing to specify the maximal timeout the cloud will wait for watchdog client to connect

    • #3929 - Use correct quote in backend documentation

    • #3926 - Use kwargs for h2o.connect in pysparkling

    • #3925 - Fix stopping of python tests

    • #3924 - Honor –core Spark settings in H2O executors

  • Improvement

    • #5112 - Sparkling Water download page is missing PySParkling/RSparkling info

    • #3930 - Upgrade H2O dependency to 3.10.4.4

    • #3928 - Download page should list available jars for external cluster.

    • #3923 - Migrate Pysparkling tests and examples to SparkSession

    • #3922 - Upgrade H2O dependency to 3.10.4.5

v2.1.3 (2017-04-7)

Downloads:

  • Bug

    • #5011 - as_factor() ‘corrupts’ dataframe if it fails

    • #3979 - Kerberos for SW not loading JAAS module

    • #3969 - Repl session not set on scala 2.11

    • #3965 - bin/pysparkling.cmd is missing

    • #3963 - Fix MarkDown syntax

    • #3962 - Run negative test for PUBDEV-3808 multiple times to observe failure

    • #3959 - Documentation fix in external cluster manual

    • #3958 - Tests for DecimalType and DataType fail on external backend

    • #3957 - Implement stopping of external H2O cluster in external backend mode

    • #3951 - Update PySparkling README with info about #3951 and using SW from Pypi

    • #3949 - Fix residual plot R code generator

    • #3948 - SW REPL cannot be used in combination with Spark Dataset

    • #3947 - Fix typo in setClientIp method

    • #3946 - Stop h2o when running inside standalone pysparkling job

    • #3945 - Extending h2o jar from SW doesn’t work when the jar is already downloaded

    • #3942 - Python in gradle is using wrong python - it doesn’t respect the PATH variable

    • #3941 - Allow to specify timeout for h2o cloud up in external backend mode

    • #3940 - Allow to specify log level to external h2o cluster

    • #3938 - Create setter in pysparkling conf for h2o client log level

    • #3937 - Better error message covering the most often case when cluster info file doesn’t exist

  • Improvement

    • #5047 - H2OConf remove nulls and make it more Scala-like

    • #3966 - Add task to Gradle build which prints all available Hadoop distributions for the corresponding h2o

    • #3952 - Upgrade of H2O dependency to 3.10.4.3

v2.1.2 (2017-03-20)

Downloads:

  • Bug

    • #3972 - Flow is not available in Sparkling Water

    • #3971 - PySparkling does not work

  • Improvement

    • #3988 - Use Spark public DNS if available to report Flow UI

v2.1.1 (2017-03-18)

Downloads:

  • Bug

    • #5037 - Intermittent failure in creating H2O cloud

    • #5024 - composite function fail when inner cbind()

    • #5003 - Environment detection does not work with Spark2.1

    • #3985 - Cannot start Sparkling Water at HDP Yarn cluster

    • #3983 - Sparkling Shell scripts for Windows do not work

    • #3982 - Fix command line environment for Windows

    • #3976 - PySparkling in Zeppelin environment using wrong class loader

  • Improvement

    • #5012 - ApplicationMaster info in Yarn for external cluster

    • #5008 - Use h2o.connect in PySpark to connect to H2O cluster

    • #3987 - Create configuration manual for External cluster

    • #3977 - Improve documentation for spark.ext.h2o.fail.on.unsupported.spark.param

    • #3973 - Upgrade H2O dependency to 3.10.4.2

v2.1.0 (2017-03-02)

Downloads:

  • Bug

    • #5014 - Security.enableSSL does not work

  • Improvement

    • #5042 - Support Spark 2.1.0

    • #5020 - Implement a generic announcement mechanism

    • #5019 - Add support to Spark 2.1 in Sparkling Water

    • #5018 - Enrich Spark UI with Sparkling Water specific tab