Tip: start with all transforms on. Only switch off a type if you know it's hurting performance.
Find Data with AI describe what you're looking for — AI finds it ›
Describe the data you need in plain English. The AI will search across all 60 fetch APIs and 15 repositories and tell you exactly where to find it.
Your Saved Datasets datasets you've uploaded or saved ›
Sample Datasets ready-to-use public-domain datasets ›
Public Data Repositories download CSVs to upload & analyse ›
Links to the best free, public-domain and open-licence data sources. Download any CSV and upload it to analyse it.
The largest community of ML datasets. Millions of user-contributed datasets across every domain, most with kernels showing how to use them.
Visit →The original academic benchmark repository. Over 600 clean, well-documented datasets used in thousands of research papers. Mostly public domain.
Visit →Google's dedicated search engine for datasets. Indexes millions of datasets from government portals, universities, and open repositories worldwide.
Visit →Open platform built specifically for ML research. Datasets come with metadata, benchmark task definitions, and community benchmark scores for comparison.
Visit →The US federal government's open data portal. Over 300,000 datasets from agencies including the CDC, NOAA, USDA, and Census Bureau. All public domain.
Visit →Global development indicators for 200+ countries — GDP, health, education, poverty, trade, and climate. Excellent for regression and time-series modelling. CC BY 4.0.
Visit →Curated long-run datasets on health, energy, inequality, and population. Every chart links to a downloadable CSV. Exceptionally clean and well-documented. CC BY.
Visit →Amazon's registry of large public datasets hosted on S3 — satellite imagery, genomics, weather, transport, and more. Many are too large to upload directly; use subsets.
Visit →Academic research datasets deposited alongside published papers. Strong in social sciences, political science, and public health. Most are CC0 or CC BY.
Visit →The data behind every FiveThirtyEight article — sports, politics, economics, and culture. Clean, analysis-ready CSVs. Great for building sports prediction models. CC BY 4.0.
Visit →Over 50,000 datasets curated for NLP, CV, and tabular ML tasks. Download any dataset as a CSV or Parquet. Rapidly growing community with filtering by task, language, and licence.
Visit →CERN's open-access repository for research data deposited alongside academic papers. 3M+ records across all scientific disciplines. Easily searchable and mostly CC BY or CC0.
Visit →The EU's official statistical database — employment, inflation, trade, energy, and demographics for all EU member states. Excellent for cross-country regression analysis. CC BY 4.0.
Visit →Investigative journalism datasets covering Congress, campaign finance, healthcare, criminal justice, and corporate lobbying. Many are unique datasets not found elsewhere. Free to download.
Visit →