100 OpenSource Datasets for AI Projects
Theme of the month : Foundations of Data Science
Image: Link
1. Quandl
2. Academic Torrents
3. Data.gov
Link: https://www.data.gov/
4. UCI Machine Learning Repository
5. Google Public Datasets
6. Datasets on Github
Link: https://github.com/awesomedata/awesome-public-datasets
7. Socrata
Link: https://opendata.socrata.com/
8. Kaggle datasets
9. World Bank
Link: http://data.worldbank.org/
10. Reserve Bank of India
11. FiveThirtyEight
12. AWS datasets
Link: https://registry.opendata.aws/
13. YouTube Video dataset
14. Analytics Vidhya
15. KDD Cups
16. Data Driven
Link: https://www.drivendata.org/
17. MNIST dataset
Link: http://yann.lecun.com/exdb/mnist
18. ImageNet
Link: http://image-net.org/
19. Yelp dataset
20. Airbnb dataset
21. Walmart dataset
Link: https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting/data
22. LendingClub
Link: https://www.lendingclub.com
23. Wikipedia Database
Link: https://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia
24. Reddit
Link: https://www.reddit.com/r/datasets/comments/65o7py/updated_reddit_comment_dataset_as_torrents/
25. UNICEF
Link: https://data.unicef.org/
26. Data.gov.uk
Link: https://data.gov.uk/
27. FBI
Link: https://ucr.fbi.gov/crime-in-the-u.s/2016/crime-in-the-u.s.-2016/topic-pages/tables/table-1
28. CDC
29. US Census data
30. Bureau of Labor Statistics
31. NASA dataset
Link: http://nssdc.gsfc.nasa.gov/
32. World Bank dataset
Link: https://datacatalog.worldbank.org/
33. Harvard University dataset
34. MIT dataset
35. University of North Carolina dataset
Link: https://www.cpc.unc.edu/projects/addhealth/documentation
36. Computer Vision dataset
Link: https://www.visualdata.io/
37. Carnegie Melon University dataset
Link: https://guides.library.cmu.edu/machine-learning/datasets
38. Boston Housing dataset
Link: https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html
39. US visualization public data
Link: https://datausa.io/
40. Google dataset
Link: https://www.kaggle.com/xiuchengwang/python-dataset-download
41. International Monetary Fund Public data
42. Financial Times dataset
43. Google Trends dataset
Link: https://trends.google.com/trends/?q=google&ctab=0&geo=all&date=all&sort=0
44. American Economic Association
Link: https://www.aeaweb.org/resources/data/us-macro-regional
45. xView
Link: http://xviewdataset.org/#dataset
46. Labelme
Link: http://labelme.csail.mit.edu/Release3.0/browserTools/php/dataset.php
47. MS COCO
Link: http://mscoco.org/
48. COIL 100
Link: https://www1.cs.columbia.edu/CAVE/software/softlib/coil-100.php
49. Google Open Images
Link: https://ai.googleblog.com/2016/09/introducing-open-images-dataset.html
50. Image dataset of Human Face
51. Dog Image dataset from Stanford
52. Indoor Image dataset from MIT
53. Sentiment Analysis dataset
54. Large Movie Review dataset from Stanford
55. Standard Sentiment dataset from Stanford
56. Twitter data on US Airline sentiment
Link: https://www.kaggle.com/crowdflower/twitter-airline-sentiment
57. Question Answering dataset
Link: https://hotpotqa.github.io/
58. Email data from Enron
59. Amazon Reviews
Link: https://snap.stanford.edu/data/web-Amazon.html
60. Google Books
61. Blogger Corpus
62. Text data from eBook dataset
Link: https://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs
63. Canada Parliament dataset
Link: https://www.isi.edu/natural-language/download/hansard/
64. Jeopardy Quiz show dataset
Link: https://www.reddit.com/r/datasets/comments/1uyd0t/200000_jeopardy_questions_in_a_json_file/
65. Rotten Tomato Reviews
About 400,000 reviews from Rotten Tomato
Link: https://drive.google.com/file/u/1/d/1w1TsJB-gmIkZ28d1j7sf1sqcPmHXw352/view
66. Spam Messages dataset
Link: http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/
67. UCI Spam Email dataset
68. Berkeley University’s Autonomous driving dataset
Link:
https://bdd-data.berkeley.edu/
69. Comma.ai
5+ hours of Highway autonomous driving dataset.
70. Oxford Autonomous Driving dataset
71. European Union dataset
72. DBpedia
Structured dataset from Wikipedia
Link:
https://wiki.dbpedia.org/
73. LODUM
Datasets from University of Munster.
74. Microsoft Academic Research data
Link: https://www.microsoft.com/en-us/research/academic-program/data-science-microsoft-research/
75. KDNuggets dataset
76. Enigma Public
World’s broadest collection of open source datasets.
Link:
https://enigma.com/
77. CMU datasets
78. Data.World
Link:
https://data.world/
79. Archive.org
80. Medicare related dataset
81. Cancer related dataset
Link: http://seer.cancer.gov/faststats/selections.php?series=cancer
82. Bureau of Economic Analysis dataset
83. Zalando -Fashion MNIST dataset
84. Skin Cancer dataset
Link: https://www.kaggle.com/kmader/skin-cancer-mnist-ham10000
85. Medical Insurance dataset
86. Real Estate Price prediction dataset
Link: https://www.kaggle.com/quantbruce/real-estate-price-prediction
87. Corona Virus dataset
Link: https://www.nytimes.com/article/coronavirus-county-data-us.html
88. Weather Forecasting dataset
89. 3D Human beings dataset
Link:
https://cv.iri.upc-csic.es/
90. Stock Market dataset -DataHub
91. Global Economic Complexity data
Link:
http://atlas.cid.harvard.edu/
92. Instagram Graph API
93. Indian Government datasets
Link:
https://data.gov.in/
94. Open Image dataset
95. Visual QA dataset
It is a dataset for Open ended questions about images.
Link:
https://visualqa.org/
96. Street View House Numbers from Stanford University
97. CIFAR-10
Image Classification dataset.
98. Twenty News Groups dataset
Link: https://archive.ics.uci.edu/ml/datasets/Twenty+Newsgroups
99. Sentiment Analysis dataset -Sentiment140
100. Spoken digit dataset
Link: https://github.com/Jakobovski/free-spoken-digit-dataset
AI News
IBM & Tech Mahindra launch WatsonX [link]
McKinsey & Nvidia partner to extend use Generative AI [link]
Microsoft to offer AMD options instead of Nvidia AI chips as demand soars [link]
Subscribe to learn one AI Concept everyday . Happy Learning :-)


