Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

The All of Us Research Program: data quality, utility, and diversity

Andrea H. Ramirez, Lina Sulieman, David J. Schlueter, Alese Halvorson, Jun Qian, Francis Ratsimbazafy, View ORCID ProfileRoxana Loperena, Kelsey Mayo, Melissa Basford, Nicole Deflaux, Karthik N. Muthuraman, Karthik Natarajan, Abel Kho, Hua Xu, Consuelo Wilkins, Hoda Anton-Culver, Eric Boerwinkle, Mine Cicek, Cheryl R. Clark, Elizabeth Cohn, Lucila Ohno-Machado, Sheri Schully, Brian K. Ahmedani, Maria Argos, View ORCID ProfileRobert M. Cronin, Christopher O’Donnell, Mona Fouad, David B. Goldstein, Philip Greenland, Scott J. Hebbring, Elizabeth W. Karlson, Parinda Khatri, View ORCID ProfileBruce Korf, Jordan W. Smoller, Stephen Sodeke, John Wilbanks, Justin Hentges, Christopher Lunt, Stephanie A. Devaney, Kelly Gebo, Joshua C Denny, Robert J. Carroll, David Glazer, Paul A. Harris, George Hripcsak, Anthony Philippakis, Dan M. Roden On behalf of the All of Us Research Program
doi: https://doi.org/10.1101/2020.05.29.20116905
Andrea H. Ramirez
1Departments of Medicine, Vanderbilt University Medical Center;
2Department of Biomedical Informatics, Vanderbilt University Medical Center;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: andrea.h.ramirez{at}vumc.org
Lina Sulieman
2Department of Biomedical Informatics, Vanderbilt University Medical Center;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
David J. Schlueter
2Department of Biomedical Informatics, Vanderbilt University Medical Center;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alese Halvorson
2Department of Biomedical Informatics, Vanderbilt University Medical Center;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jun Qian
2Department of Biomedical Informatics, Vanderbilt University Medical Center;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Francis Ratsimbazafy
3Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Roxana Loperena
3Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Roxana Loperena
Kelsey Mayo
3Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Melissa Basford
3Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nicole Deflaux
4Verily Life Sciences;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Karthik N. Muthuraman
4Verily Life Sciences;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Karthik Natarajan
5Department of Biomedical Informatics, Columbia University Medical Center;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Abel Kho
6Center for Health Information Partnerships, Northwestern University;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hua Xu
7School of Biomedical Informatics, The University of Texas Health Science Center at Houston;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Consuelo Wilkins
1Departments of Medicine, Vanderbilt University Medical Center;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hoda Anton-Culver
8Department of Medicine, University of California Irvine;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Eric Boerwinkle
9School of Public Health, The University of Texas Health Science Center at Houston;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mine Cicek
10Department of Laboratory Medicine and Pathology, Mayo Clinic;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Cheryl R. Clark
11Department of Medicine, Brigham and Women’s Hospital;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Elizabeth Cohn
12Hunter-Bellevue School of Nursing, Hunter College City University of New York;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lucila Ohno-Machado
13Department of Biomedical Informatics, UCSD Health;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sheri Schully
14All of Us Research Program, National Institutes of Health;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Brian K. Ahmedani
15Center for Health Policy & Health Services Research, Henry Ford Health System;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Maria Argos
16School of Public Health, University of Illinois at Chicago;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Robert M. Cronin
2Department of Biomedical Informatics, Vanderbilt University Medical Center;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Robert M. Cronin
Christopher O’Donnell
17Cardiology Section, Department of Medicine, Veterans Administration Boston Healthcare System and Harvard Medical School;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mona Fouad
18Division of Preventive Medicine, University of Alabama at Birmingham;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
David B. Goldstein
19Institute of Genomic Medicine, Columbia University Medical Center;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Philip Greenland
20Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Scott J. Hebbring
21Center for Precision Medicine, Marshfield Clinic;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Elizabeth W. Karlson
11Department of Medicine, Brigham and Women’s Hospital;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Parinda Khatri
22Cherokee Health Systems;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bruce Korf
23Department of Genetics, University of Alabama at Birmingham;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Bruce Korf
Jordan W. Smoller
24Department of Psychiatry and Center for Genomic Medicine, Massachusetts General Hospital;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Stephen Sodeke
25Center for Biomedical Research, Tuskegee University;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
John Wilbanks
25Center for Biomedical Research, Tuskegee University;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Justin Hentges
14All of Us Research Program, National Institutes of Health;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Christopher Lunt
14All of Us Research Program, National Institutes of Health;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Stephanie A. Devaney
14All of Us Research Program, National Institutes of Health;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kelly Gebo
14All of Us Research Program, National Institutes of Health;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joshua C Denny
14All of Us Research Program, National Institutes of Health;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Robert J. Carroll
2Department of Biomedical Informatics, Vanderbilt University Medical Center;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
David Glazer
4Verily Life Sciences;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Paul A. Harris
2Department of Biomedical Informatics, Vanderbilt University Medical Center;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
George Hripcsak
5Department of Biomedical Informatics, Columbia University Medical Center;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Anthony Philippakis
27Broad Institute; Department of Pharmacology, Vanderbilt University Medical Center
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Dan M. Roden
1Departments of Medicine, Vanderbilt University Medical Center;
2Department of Biomedical Informatics, Vanderbilt University Medical Center;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
All of Us
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Importance The All of Us Research Program hypothesizes that accruing one million or more diverse participants engaged in a longitudinal research cohort will advance precision medicine and ultimately improve human health. Launched nationally in 2018, to date All of Us has recruited more than 345,000 participants. All of Us plans to open beta access to researchers in May 2020.

Objective To demonstrate the quality, utility, and diversity of the All of Us Research Program’s initial data release and beta launch of the cloud-based analysis platform, the cloud-based Researcher Workbench.

Evidence We analyzed the initial All of Us data release, comprising surveys, physical measurements (PM), and electronic health record (EHR) data, to characterize All of Us participants including self-reported descriptors of diversity. Data depth, density, and quality were evaluated using medication sequencing analyses for depression and type 2 diabetes. Replication of known oncologic associations with smoking exposure ascertained by EHR and survey data and calculation of population-based atherosclerotic cardiovascular disease risk scores demonstrated the utility of data and platform capability.

Findings The beta launch of the All of Us Researcher Workbench contains data on 224,143 participants. Seventy-seven percent of this cohort were identified as Underrepresented in Biomedical Research (UBR) including over forty-eight percent self-reporting non-White race. Medication usage patterns in common diseases depression and type 2 diabetes replicated prior findings previously reported in the literature and showed differences based on race. Oncologic associations with smoking were replicated and effect sizes compared for EHR and survey exposures finding general agreement. A cardiovascular disease score was calculated utilizing multiple data elements curated across sources. The cloud-based architecture built in the Researcher Workbench provided secure access and powerful computational resources at a low cost. All analyses have been made available for replication and reuse by registered researchers.

Conclusions and Relevance The All of Us Research Program’s initial release of cohort data contains longitudinal and multidimensional data on diverse participants that replicate known associations. This dataset and the cloud-based Researcher Workbench advance the mission of All of Us to make data widely and securely available to researchers to improve human health and advance precision medicine.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

The All of Us Research Program is supported (or funded) by grants through the National Institutes of Health, Office of the Director: Regional Medical Centers: 1 OT2 OD026549; 1 OT2 OD026554; 1 OT2 OD026557; 1 OT2 OD026556; 1 OT2 OD026550; 1 OT2 OD 026552; 1 OT2 OD026553; 1 OT2 OD026548; 1 OT2 OD026551; 1 OT2 OD026555; IAA #: AOD 16037; Federally Qualified Health Centers: HHSN 263201600085U; Data and Research Center: 5 U2C OD023196; Biobank: 1 U24 OD023121; The Participant Center: U24 OD023176; Participant Technology Systems Center: 1 U24 OD023163; Communications and Engagement: 3 OT2 OD023205; 3 OT2 OD023206; and Community Partners: 1 OT2 OD025277; 3 OT2 OD025315; 1 OT2 OD025337; 1 OT2 OD025276. In addition to the funded partners, the All of Us Research Program would not be possible without the contributions made by its participants.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Approval to use the dataset for the specified demonstration projects was obtained from the All of Us Institutional Review Board. Results reported are in compliance with the All of Us Data and Statistics Dissemination Policy disallowing disclosure of group counts under 20 to protect participant privacy.

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

The dataset was accessed through the All of Us Researcher Workbench platform, a cloud-based analytic platform custom built by the program for approved researchers. The Workbench is built on top of the Terra platform, which is also utilized for a number of other NIH-funded studies including the NCI Cloud Resources, the NHLBI BioData Catalyst, and the NHGRI AnVIL. Access to the Researcher Workbench and data are free. Compute and storage accrue usage cost. All researchers who accessed the data for analyses were authorized and approved via a 3-step process that included registration, completion of ethics training, and attestation to a data use agreement. The Researcher Workbench uses Google Compute Engine for computational resources in the cloud and Google Cloud Storage for storage in the cloud.

https://workbench.researchallofus.org/workspaces/aou-rw-dd7cff0e/medicationspathwaysequencesbyracephase1/notebooks

https://workbench.researchallofus.org/workspaces/aou-rw-a8fc912d/duplicateofframinghamahariskscore/notebooks

https://workbench.researchallofus.org/workspaces/aou-rw-d59956e4/jamaphewasfinalreview05212020/data

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted June 03, 2020.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
The All of Us Research Program: data quality, utility, and diversity
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
The All of Us Research Program: data quality, utility, and diversity
Andrea H. Ramirez, Lina Sulieman, David J. Schlueter, Alese Halvorson, Jun Qian, Francis Ratsimbazafy, Roxana Loperena, Kelsey Mayo, Melissa Basford, Nicole Deflaux, Karthik N. Muthuraman, Karthik Natarajan, Abel Kho, Hua Xu, Consuelo Wilkins, Hoda Anton-Culver, Eric Boerwinkle, Mine Cicek, Cheryl R. Clark, Elizabeth Cohn, Lucila Ohno-Machado, Sheri Schully, Brian K. Ahmedani, Maria Argos, Robert M. Cronin, Christopher O’Donnell, Mona Fouad, David B. Goldstein, Philip Greenland, Scott J. Hebbring, Elizabeth W. Karlson, Parinda Khatri, Bruce Korf, Jordan W. Smoller, Stephen Sodeke, John Wilbanks, Justin Hentges, Christopher Lunt, Stephanie A. Devaney, Kelly Gebo, Joshua C Denny, Robert J. Carroll, David Glazer, Paul A. Harris, George Hripcsak, Anthony Philippakis, Dan M. Roden
medRxiv 2020.05.29.20116905; doi: https://doi.org/10.1101/2020.05.29.20116905
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
The All of Us Research Program: data quality, utility, and diversity
Andrea H. Ramirez, Lina Sulieman, David J. Schlueter, Alese Halvorson, Jun Qian, Francis Ratsimbazafy, Roxana Loperena, Kelsey Mayo, Melissa Basford, Nicole Deflaux, Karthik N. Muthuraman, Karthik Natarajan, Abel Kho, Hua Xu, Consuelo Wilkins, Hoda Anton-Culver, Eric Boerwinkle, Mine Cicek, Cheryl R. Clark, Elizabeth Cohn, Lucila Ohno-Machado, Sheri Schully, Brian K. Ahmedani, Maria Argos, Robert M. Cronin, Christopher O’Donnell, Mona Fouad, David B. Goldstein, Philip Greenland, Scott J. Hebbring, Elizabeth W. Karlson, Parinda Khatri, Bruce Korf, Jordan W. Smoller, Stephen Sodeke, John Wilbanks, Justin Hentges, Christopher Lunt, Stephanie A. Devaney, Kelly Gebo, Joshua C Denny, Robert J. Carroll, David Glazer, Paul A. Harris, George Hripcsak, Anthony Philippakis, Dan M. Roden
medRxiv 2020.05.29.20116905; doi: https://doi.org/10.1101/2020.05.29.20116905

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Public and Global Health
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)