References ========== Prior publications and useful reading relevant to analysis in general and for each algorithm can be found in the references listed below. Understanding the algorithms of H\ :sub:`2`\ O is an integral part of using the platform correctly, and getting the most of analysis. Below are the citations of seminal articles, and articles demonstrating rigorous application of the algorithms of H\ :sub:`2`\ O This list is not meant to be exhaustive, but rather to provide an abbreviated syllabus to help develop a strong understanding. Recommended Reading ------------------- Hastie, Trevor, Robert Tibshirani, and J Jerome H Friedman. The Elements of Statistical Learning. Vol.1. N.p.: Springer New York, 2001. GLM ------ Breslow, N E. "Generalized Linear Models: Checking Assumptions and Strengthening Conclusions." Statistica Applicata 8 (1996): 23-41. Goldberger, Arthur S. "Best Linear Unbiased Prediction in the Generalized Linear Regression Model." Journal of the American Statistical Association 57.298 (1962): 369-375. Guisan, Antoine, Thomas C Edwards Jr, and Trevor Hastie. "Generalized Linear and Generalized Additive Models in Studies of Species Distributions: Setting the Scene." Ecological modelling 157.2 (2002): 89-100. Nelder, John A, and Robert WM Wedderburn. "Generalized Linear Models." Journal of the Royal Statistical Society. Series A (General) (1972): 370-384. Snee, Ronald D. "Validation of Regression Models: Methods and Examples." Technometrics 19.4 (1977): 415-428. Poisson ---------- Frome, E L. "The Analysis of Rates Using Poisson Regression Models." Biometrics (1983): 665-674. Logistic (binomial and multinomial) ----------------------------------- Press, S James, and Sandra Wilson. "Choosing Between Logistic Regression and Discriminant Analysis." Journal of the American Statistical Association 73.364 (April, 2012): 699–705. Pearce, Jennie, and Simon Ferrier. "Evaluating the Predictive Performance of Habitat Models Developed Using Logistic Regression." Ecological modelling 133.3 (2000): 225-245. GBM --- Dietterich, Thomas G, and Eun Bae Kong. "Machine Learning Bias, Statistical Bias, and Statistical Variance of Decision Tree Algorithms." ML-95 255 (1995). Elith, Jane, John R Leathwick, and Trevor Hastie. "A Working Guide to Boosted Regression Trees." Journal of Animal Ecology 77.4 (2008): 802-813 Friedman, Jerome H. "Greedy Function Approximation: A Gradient Boosting Machine." Annals of Statistics (2001): 1189-1232. Friedman, Jerome, Trevor Hastie, Saharon Rosset, Robert Tibshirani, and Ji Zhu. "Discussion of Boosting Papers." Ann. Statist 32 (2004): 102-107 Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. "Additive Logistic Regression: A Statistical View of Boosting (With Discussion and a Rejoinder by the Authors)." The Annals of Statistics 28.2 (2000): 337-407 Neural Networks --------------- Baldi, Pierre, and Kurt Hornik. "Neural Networks and Principal Component Analysis: Learning From Examples Without Local Minima." Neural networks 2.1 (1989): 53-58. Coolen, A C C. Concepts for Neural Networks. N.p.: Springer, 1998. 13-70. Tweedie ------- Dunn, Peter K. "Occurrence and Quantity of Precipitation Can Be Modelled Simultaneously." International Journal of Climatology 24.10 (2004): 1231-1239. K-Means ------- Napoleon, D, and S Pavalakodi. "A New Method for Dimensionality Reduction Using KMeans Clustering Algorithm for High Dimensional Data Set." International Journal of Computer Applications 13.7 (2011): 41-46. Xiong, Hui, Junjie Wu, and Jian Chen. "K-means Clustering Versus Validation Measures: A Data- distribution Perspective." Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 39.2 (2009): 318-331.