Link Search Menu Expand Document

Data Resources

Table of contents

  1. Data sources
    1. AHRQ
    2. Biomedical research
    3. CDC
    4. CMS
    5. DEA
    6. Deidentified patient data
    7. FDA
    8. General
    9. HCPCS
    10. Medicaid
    11. NIH
    12. Research
    13. Synthetic patient data
    14. U.S. Government
  2. Databases
    1. Graph
    2. NoSQL
    3. Relational
  3. Notebooks
    1. General
    2. JavaScript
  4. Tools
    1. IDE
  5. Visualization
    1. General

Data sources


Link Description
MEPS Prescribed Medicines Files Contains household prescribed medicines event files.
MEPS Summary Tables The MEPS Household Component summary tables provide frequently used summary estimates for the U.S. civilian noninstitutionalized population on household medical utilization and expenditures, demographic and socio-economic characteristics, health insurance coverage, access to care and experience with care, medical conditions, and prescribed medicine purchases.
MEPS Summary Tables: Prescribed drugs These MEPS summary tables provide statistics on total expenditures, total purchases, and number of persons with purchases for prescription medicines or therapeutic class groups.
Medical Expenditure Panel Survey (MEPS) MEPS collects data on the specific health services that Americans use, how frequently they use them, the cost of these services, and how they are paid for, as well as data on the cost, scope, and breadth of health insurance held by and available to U.S. workers.

Biomedical research

Link Description
Terra Access data, run analysis tools, and collaborate in Terra: a scalable platform for biomedical research.


Link Description
COVID Data Tracker COVID-19 Vaccinations in the United States
Survey Data & Documentation In 1984, the Centers for Disease Control and Prevention (CDC) initiated the state-based Behavioral Risk Factor Surveillance System (BRFSS) a cross-sectional telephone survey that state health departmentsconduct monthly over landline telephones and cellular telephones with a standardized questionnaire and technical and methodologic assistance from CDC.


Link Description
CMS 2008-2010 Data Entrepreneurs Synthetic Public Use File (DE-SynPUF) The DE-SynPUF was created with the goal of providing a realistic set of claims data in the public domain while providing the very highest degree of protection to the Medicare beneficiaries protected health information. Data that helps you better understand CMS programs.
Medicaid Pharmacy Pricing View, filter, sort, visualize, and share Pharmacy Pricing Data available on Export data in a variety of formats including Excel .
Medicare Part B Drug Average Sales Price Manufacturer reporting of Average Sales Price (ASP) data
Medicare Provider Utilization and Payment Data CMS has released a series of publicly available data files that summarize the utilization and payments for procedures, services, and prescription drugs provided to Medicare beneficiaries by specific inpatient and outpatient hospitals, physicians, and other suppliers. The Open Payments Search Tool is used to search payments made by drug and medical device companies to physicians and teaching hospitals.
Research, Statistics, Data & Systems  


Link Description
Automation of Reports and Consolidated Orders System (ARCOS) ARCOS is an automated, comprehensive drug reporting system which monitors the flow of DEA controlled substances from their point of manufacture through commercial distribution channels to point of sale or distribution at the dispensing/retail level - hospitals, retail pharmacies, practitioners, mid-level practitioners, and teaching institutions.

Deidentified patient data

Link Description
MIMIC MIMIC is an openly available dataset developed by the MIT Lab for Computational Physiology, comprising deidentified health data associated with ~60,000 intensive care unit admissions. It includes demographics, vital signs, laboratory tests, medications, and more.


Link Description
Drugs@FDA Data Files A compressed data file of the Drugs@FDA database. It does not include the scripts (programming) we use to produce the online version of Drugs@FDA.


Link Description
Health and Retirement Study (HRS) Data Products The University of Michigan Health and Retirement Study (HRS) is a longitudinal panel study that surveys a representative sample of approximately 20,000 people in America, supported by the National Institute on Aging (NIA U01AG009740) and the Social Security Administration.
The Human Mortality Database The Human Mortality Database (HMD) was created to provide detailed mortality and population data to researchers, students, journalists, policy analysts, and others interested in the history of human longevity.


Link Description
Addendum A and Addendum B Updates A “snapshot” of HCPCS codes and their status indicators, APC groups, and OPPS payment rates, that are in effect at the beginning of each quarter.
National Correct Coding Initiative Edits The CMS developed the National Correct Coding Initiative (NCCI) to promote national correct coding methodologies and to control improper coding leading to inappropriate payment in Part B claims.


Link Description The home of Medicaid & CHIP open data.


Link Description
Archived Clinical Research Datasets The data from NINDS-supported clinical trials are an important scientific resource, made available to the wider scientific community, while ensuring that the confidentiality and privacy of study participants are protected. is a database of privately and publicly funded clinical studies conducted around the world.
Genomic Data Commons Data Portal The GDC Data Portal is a robust data-driven platform that allows cancer researchers and bioinformaticians to search and download cancer data for analysis.
Open Domain-Specific Data Sharing Repositories Lists NIH-supported domain-specific data repositories that make data accessible for reuse and are open for both submitting and accessing data.
STRIDES Initiative The NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative allows NIH to explore the use of cloud environments to streamline NIH data use by partnering with commercial providers.
Unified Medical Language System (UMLS) The UMLS integrates and distributes key terminology, classification and coding standards, and associated resources to promote creation of more effective and interoperable biomedical information systems and services, including electronic health records.


Link Description
All of Us Data Browser The Data Browser provides interactive views of the publicly available All of Us Research Program participant data.
PBM Rx Database PBM developed the legacy software system/database making it available for operations and research to organize, track and analyze VA prescription data.
eICU Collaborative Research Database The eICU Collaborative Research Database, a freely available multi-center database for critical care research.

Synthetic patient data

Link Description
Synthea CSV Downloads  
Synthea FHIR API  
Synthea GitHub Synthea is a Synthetic Patient Population Simulator. The goal is to output synthetic, realistic (but not real), patient data and associated health records in a variety of formats.

U.S. Government

Link Description The home of the U.S. Government’s open data



Link Description
Amazon Neptune Amazon Neptune is a fast, reliable, fully managed graph database service that makes it easy to build and run applications that work with highly connected datasets.
ArangoDB Natively store data for graph, document and search needs. Utilize feature-rich access with one query language.
Dgraph Dedicated, multi-zone clusters on AWS, Azure, or GCP with fault-tolerance.
Neo4j Neo4j is a native graph database, built from the ground up to leverage not only data but also data relationships. Neo4j connects data as it’s stored, enabling queries never before imagined, at speeds never thought possible.


Link Description
MongoDB A document-based, distributed database built to handle the data of any application you’re building.


Link Description
MariaDB The open source relational database.
MySQL MySQL Database Service is a fully managed database service to deploy cloud-native applications.
PostgreSQL PostgreSQL is a powerful, open source object-relational database system with over 30 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance.



Link Description
Databricks Powered by Delta Lake, Databricks combines the best of data warehouses and data lakes into a lakehouse architecture, giving you one platform to collaborate on all of your data, analytics and AI workloads.
Deep Note Deepnote is a new kind of data science notebook. Jupyter-compatible with real-time collaboration and running in the cloud. Oh, and it’s free.
Jupyter Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages.


Link Description
Observable Explore and visualize data. Share and publish your insights. Discover and be inspired.



Link Description
Dbeaver DBeaver is free and open source universal database tool for developers and database administrators.
SQL Online IDE Connect to local SQLite databases or play around with SQL queries in an online IDE.



Link Description
Plotly Dash The premier low-code platform for ML & data science apps.
Streamlit The fastest way to build and share data apps.