Public Data Sets ---------------- Datasets are such an integral part of data science and algorithms that it's almost impossible to talk about H2O without talking about data. This is a small but growing collection of links with publicly available data. ========== Open City ========== =========================== ================================================================================= Type/Source Link =========================== ================================================================================= Palo Alto Open Data http://www.cityofpaloalto.org/gov/depts/it/open_data/default.asp Chicago https://data.cityofchicago.org/ 20-year span of crime data https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2 NYC https://nycopendata.socrata.com/ Rents & Neighborhoods http://www.huduser.org/portal/datasets/HUD_data_matrix.html =========================== ================================================================================= ========================== Transportation and Travel ========================== =========================== ============================================================================================ Type/Source Link =========================== ============================================================================================ Airlines (1987-2007) http://stat-computing.org/dataexpo/2009/the-data.html (based on `RHIPE's `_ dataset. Data `source `_) Open flights http://openflights.org/data.html Capital Bike Share https://www.capitalbikeshare.com/trip-history-data =========================== ============================================================================================ ======================== Sciences and Engineering ======================== ==================================================== =========================================================================================== Type/Source Link ==================================================== =========================================================================================== Elements Of Statistics Learning Data http://www-stat.stanford.edu/~tibs/ElemStatLearn/data.html NASA http://data.nasa.gov/ Seismic http://sioseis.ucsd.edu/segy.header.html Weather http://OpenWeatherMap.org | http://OpenMeteoData.org NIST http://srdata.nist.gov/gateway/gateway?dblist=0 GitHub Archive http://www.githubarchive.org ==================================================== =========================================================================================== ================== Diverse Data Sets ================== ==================================================== =========================================================================================== Type/Source Link ==================================================== =========================================================================================== Many Eyes Community http://www-958.ibm.com/software/analytics/manyeyes/ Kaggle Competitions http://www.kaggle.com/ UCI Machine Learning Library http://archive.ics.uci.edu/ml/datasets.html Human Activity Recognition Using Smartphones http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones MLData repository http://mldata.org/ GitHub Challenge https://github.com/blog/1450-the-github-data-challenge-ii Yelp Challenge https://www.yelp.com/dataset_challenge Netflix Prize http://stackoverflow.com/questions/1407957/netflix-prize-dataset Infochimps http://www.infochimps.com/ Stanford Library http://snap.stanford.edu/data/index.html Million Songs http://labrosa.ee.columbia.edu/millionsong/pages/getting-dataset Caret http://caret.r-forge.r-project.org/datasets.html RevolutionR http://www.revolutionanalytics.com/subscriptions/datasets/ Find a dataset http://www.inside-r.org/howto/finding-data-internet LIBSVM Compilation http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ The Data Page NYU http://people.stern.nyu.edu/adamodar/New_Home_Page/data.html ==================================================== =========================================================================================== =================== Public Policy Data =================== ==================================================== =========================================================================================== Type/Source Link ==================================================== =========================================================================================== European Open Data (6098 datasets) http://open-data.europa.eu/en/ US Open Data http://www.data.gov/ | http://www.data.gov/opendatasites WorldBank http://data.worldbank.org/data-catalog Guardian http://www.guardian.co.uk/news/datablog/interactive/2013/jan/14/all-our-datasets-index Statistics Netherlands http://www.cbs.nl/en-GB/menu/home/default.htm?Languageswitch=on Quandl 6M Financial, Economics, and Social Datasets http://www.quandl.com/ ==================================================== =========================================================================================== ========== Other ========== =============================== ================================================================================= Type/Source Link =============================== ================================================================================= MovieLens film recommendations http://grouplens.org/datasets/movielens/ =============================== =================================================================================