Where can I find large datasets open to the public?

Q. Where can I find large datasets open to the public?
A.

– Cross-disciplinary data repositories, data collections and data search engines:

  1. https://www.kaggle.com/datasets
  2. http://www.assetmacro.com
  3. http://usgovxml.com
  4. http://aws.amazon.com/datasets
  5. http://databib.org
  6. http://datacite.org
  7. http://figshare.com
  8. http://linkeddata.org
  9. http://reddit.com/r/datasets
  10. http://thewebminer.com/
  11. http://thedatahub.org alias http://ckan.net
  12. http://quandl.com
  13. http://www.growmeme.com/overview
  14. http://www.kdnuggets.com/datasets/index.html
  15. http://enigma.io
  16. http://www.ufindthem.com/
  17. http://NetworkRepository.com
  18. http://MLvis.com
  19. http://www.growmeme.com/overview
  20. http://data.opendatasoft.com
  21. http://gdeltproject.org/data.html

__________________________

– Single datasets and data repositories

1- http://archive.ics.uci.edu/ml/
2- http://crawdad.org/
3- http://data.austintexas.gov
4- http://data.cityofchicago.org
5- http://data.govloop.com
6- http://data.gov.uk/
7- http://data.gov.in
8- http://data.medicare.gov
9- http://data.seattle.gov
10- http://data.sfgov.org
11- http://data.sunlightlabs.com
12- https://datamarket.azure.com/
13- http://developer.yahoo.com/geo/g
14- http://econ.worldbank.org/datasets
15- http://en.wikipedia.org/wiki/Wik
16- http://factfinder.census.gov/ser
17- http://ftp.ncbi.nih.gov/
18- http://gettingpastgo.socrata.com
19- http://googleresearch.blogspot.c...
20- http://books.google.com/ngrams/
21- http://medihal.archives-ouvertes.fr
22- http://public.resource.org/
23- http://rechercheisidore.fr
24- http://snap.stanford.edu/data/in
25- http://timetric.com/public-data/
26- https://wist.echo.nasa.gov/~wist
27- http://www2.jpl.nasa.gov/srtm
28- http://www.archives.gov/research
29- http://www.bls.gov/
30- http://www.crunchbase.com/
31- http://www.dartmouthatlas.org/
32- http://www.data.gov/
33- http://www.datakc.org
34- http://dbpedia.org
35- http://www.delicious.com/jbaldwi
36- http://www.faa.gov/data_research/
37- http://www.factual.com/
38- http://research.stlouisfed.org/f
39- http://www.freebase.com/
40- http://www.google.com/publicdata
41- http://www.guardian.co.uk/news/d
42- http://www.infochimps.com
43- http://www.kaggle.com/
44- http://build.kiva.org/
45- http://www.nationalarchives.gov.
46- http://www.nyc.gov/html/datamine
47- http://www.ordnancesurvey.co.uk/
48- http://www.philwhln.com/how-to-g
49- http://www.imdb.com/interfaces
50- http://imat-relpred.yandex.ru/en
51- http://www.dados.gov.pt/pt/catal
52- http://knoema.com
53- http://daten.berlin.de/
54- http://www.qunb.com
55- http://databib.org/
56- http://datacite.org/
57- http://data.reegle.info/
58- http://data.wien.gv.at/
59- http://data.gov.bc.ca
60- https://pslcdatashop.web.cmu.edu/
61- http://www.icpsr.umich.edu/icpsrweb/CPES/ – Collaborative 
62- http://www.dati.gov.it
63- http://dati.trentino.it
64- http://www.databagg.com/
65- http://networkrepository.com
66- http://www.grid.unep.ch/index.php?lang=en
_______
Source : Quora

Contributions from group members:
1- http://academictorrents.com/

By Hatem Kotb
Below are some of the links I found that contain data sets for those who want to practice. Some of them are public city data sets filtered by type (Education, Health, Transport…etc.)

  1. http://open.canada.ca/en
  2. http://www.eea.europa.eu/data-and-maps/
  3. http://homepage.data-planet.com/
  4. https://inventory.data.gov/dataset
  5. http://www.google.com/publicdata/directory
  6. http://www.europeandataportal.eu/
  7. https://nycopendata.socrata.com/
  8. https://www.quandl.com/
  9. https://www.quora.com/Where-can-I-find-large-datasets-open-…
  10. https://www.reddit.com/r/datasets
  11. https://data.sfgov.org/
  12. https://archive.ics.uci.edu/ml/datasets.html
  13. https://data.gov.uk/
  14. http://www.data.gov/education/
  15. http://www.data.gov/
  16. http://www.liquidasset.com/winedata.html
  17. http://finance.yahoo.com/q/hp?s=YHOO
  18. http://toddwschneider.com/…/analyzing-1-1-billion-nyc-taxi…/
  19. http://wiki.dbpedia.org/Datasets
  20. http://archive.ics.uci.edu/ml/datasets.html
  21. http://support.minitab.com/en-us/datasets/
  22. http://apps.who.int/gho/data/node.main
  23. http://datasociety.co/data/
  24. https://data.cityofchicago.org/…/Crimes-2001-to-p…/ijzp-q8t2
  25. http://www.baseball-reference.com/
  26. https://crudata.uea.ac.uk/cru/data/temperature/
  27. http://www.esrl.noaa.gov/gmd/ccgg/data-products.html
  28. http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2011038

#datasets #data

Posted By : Mohamed Abuelanin

How to Start Big Data or Data Science

شباب اى حد بيسأل نبدأ منين لو حابين نذاكر Big data or Data Science
البوست ده ان شاء الله هيجاوب عليكم…

دلوقتى معظم الناس بره بتشتغل مش انك تاخد كورسات مختلفة وانت تحاول تجمع الفكرة لا دلوقتى فى تخصصات كاملة بتدرس على النت وهو بيحطلك المنهج اللى يخليك متخصص فيه وبيبدأ معاك من الصفر ولو في كورسات معتمد عليها هيقولك ويحطلك اللينكات بتاعتها وممكن تاخد شهادة فى الاخر انك اجتزت التخصص ده.

هحط شوية تخصصات هتبدأ الشهر ده على كورسيرا. وبتتكرر كل شهر تقريبا. بعض الكورسات حطيت التفاصيل بتاعتها كل كورس موجود تقريبا مدته شهر ومتوسط اقصى كورس مذاكرة هتكون 7:9 ساعات اسبوعي
وياريت لو حد هيبدأ او شغال ممكن يعرفنا عشان ممكن نعمل ورش عمل في التخصصات ونتشارك المحتوى.
بالتوفيق…
فى بوست تعليق على البوست ده ياريت اللى حابب يبدأ يقرأه برده 
https://www.facebook.com/groups/big.data.egypt/1651381198454408/

1-Big Data Specialization
https://www.coursera.org/specializations/big-data

**Introduction to Big Data
**Hadoop
**Introduction to Big Data Analytics
**Machine Learning With Big Data
**Introduction to Graph Analytics
Big Data – Capstone

2- Data Science Specialization.
One of the best Data Science Specialization from Johns Hopkins University
https://www.coursera.org/specialization/jhudatascience/1
I working in this specialization now. It starts from “zero knowledge”

××-The Data Scientist’s Toolbox
××-R Programming
××-Getting and Cleaning Data
××-Exploratory Data Analysis
××-Reproducible Research
××-Statistical Inference
××-Regression Models
××-Practical Machine Learning
××-Developing Data Products

3-Data Science at Scale Specialization
https://www.coursera.org/specializations/data-science
From George Washington University

**Data Manipulation at Scale: Systems and Algorithms
**Practical Predictive Analytics: Models and Methods
**Communicating Results: Visualization, Ethics, Reproducibility
**Data Science at Scale – Capstone Project

4-Machine Learning Specialization
https://www.coursera.org/specializations/machine-learning
From George Washington University

**Machine Learning Foundations: A Case Study Approach
**Regression
**Classification
**Clustering & Retrieval
**Recommender Systems & Dimensionality Reduction
**Machine Learning Capstone: An Intelligent Application with Deep Learning

5-Data Warehousing for Business Intelligence Specialization
https://www.coursera.org/specializations/data-warehousing

6-Data Analysis and Interpretation Specialization
https://www.coursera.org/specializations/data-analysis

7-Internet of Things Specialization
https://www.coursera.org/specializations/internet-of-things

8-Internet of Things and Embedded Systems Specialization
https://www.coursera.org/specializations/iot

learn Data Science by doing

learn Data Science by doing…
Here is a list of hands-on tutorials. Collected under a condition that it contains end to end practical steps starting from a data set and ending with the data science deliverable…

Have a joyful learning…

Kaggle solution to competition (Denoising Dirty Documents – Image Processing)
http://blog.kaggle.com/…/image-processing-machine-learning…/

All parts here:
https://colinpriest.com/…/01/denoising-dirty-documents-par…/
https://colinpriest.com/…/07/denoising-dirty-documents-par…/
https://colinpriest.com/…/14/denoising-dirty-documents-par…/
https://colinpriest.com/…/21/denoising-dirty-documents-par…/
https://colinpriest.com/…/28/denoising-dirty-documents-par…/
https://colinpriest.com/…/07/denoising-dirty-documents-par…/
https://colinpriest.com/…/23/denoising-dirty-documents-par…/
https://colinpriest.com/…/02/denoising-dirty-documents-par…/
https://colinpriest.com/…/15/denoising-dirty-documents-par…/
https://colinpriest.com/…/denoising-dirty-documents-part-10/
https://colinpriest.com/…/denoising-dirty-documents-part-11/
https://colinpriest.com/…/an-even-dozen-denoising-dirty-do…/

/******************************************/

Kaggle Solution for Second Annual Data Science Bowl: Automatically Finding the Heart Location in an MRI Image

Part 1:
https://colinpriest.com/…/second-annual-data-science-bowl-…/

Part 2:
https://colinpriest.com/…/second-annual-data-science-bowl-…/

Part 3:
https://colinpriest.com/…/second-annual-data-science-bowl-…/

/******************************************/

Kaggle Titanic Solution in R
https://www.kaggle.com/c/…/detai…/new-getting-started-with-r

and in Python
https://www.kaggle.com/…/details/getting-started-with-python

/******************************************/

Predicting SAT scores for New York Schools
http://blog.kaggle.com/…/getting-started-with-pandas-predi…/

/******************************************/

Diagnosing Heart Diseases 
http://blog.kaggle.com/…/diagnosing-heart-diseases-with-de…/

/******************************************/

Analytics Vidhya Case Study: Optimize Pricing for online vendor:
http://www.analyticsvidhya.com/…/solving-case-study-optimi…/

/******************************************/

Learn Analytics With Complete Case Study:
Part 1:
http://www.analyticsvidhya.com/…/learn-analytics-business-…/
Part 2:
http://www.analyticsvidhya.com/…/learn-analytics-business-…/

/******************************************/

Conjoint analysis with R
http://www.analyticsvidhya.com/…/beginner-tutorial-conjoin…/

/******************************************/

Kaggle solution: Text analytics:
http://www.analyticsvidhya.com/…/kaggle-solution-cooking-t…/

/******************************************/

Update: 28-July-2016:
Spam filtering using Python:
http://radimrehurek.com/data_science_python/

/******************************************/

Open source list of tutorials with links end-to-end:
https://github.com/donnemart…/data-science-ipython-notebooks

/******************************************/

Complete series for Kaggle competition 
http://brettromero.com/wo…/category/technology/data-science/

/******************************************/

Open source index for Data Science practical lessons:
https://github.com/open-source-society/data-science

/******************************************/

Update 23rd April – 2017:

Kaggle All State Purchase Prediction:
https://www.kaggle.com/c/allstate-purchase-prediction-chall…

Solution in R:
https://github.com/B1aine/kaggle-allstate

Solution in Python:
https://github.com/alzmcr/allstate

/******************************************/

Kaggle MLSP Birds Classification:
https://www.kaggle.com/c/mlsp-2013-birds

Solution:
https://github.com/gaborfodor/MLSP_2013

/******************************************/

Kaggle Galaxy Zoo Challenge:
https://www.kaggle.com/c/galaxy-zoo-the-galaxy-challenge

Solution in Python:
https://github.com/benanne/kaggle-galaxies

/******************************************/

Kaggle Large Scale Hierarchical Text Classification:
https://www.kaggle.com/c/lshtc

Solution in C++ (WoW – 3rd Place Winner):
https://github.com/nagadomi/kaggle-lshtc

/******************************************/

Another great list of competitions with answers:
http://www.chioka.in/kaggle-competition-solutions/

/******************************************/

To be continued…
Please share your links and I’ll add it…

Posted By: Ahmed Zareef