Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

A scalable EHR-based approach for phenotype discovery and variant interpretation for hereditary cancer genes

View ORCID ProfileChenjie Zeng, View ORCID ProfileLisa A. Bastarache, Ran Tao, View ORCID ProfileEric Venner, Scott Hebbring, Justin D. Andujar, Sarah T. Bland, View ORCID ProfileDavid R. Crosslin, View ORCID ProfileSiddharth Pratap, View ORCID ProfileAyorinde Cooley, View ORCID ProfileJennifer A. Pacheco, View ORCID ProfileKurt D. Christensen, Emma Perez, View ORCID ProfileCarrie L. Blout Zawatsky, View ORCID ProfileLeora Witkowski, View ORCID ProfileHana Zouk, Chunhua Weng, View ORCID ProfileKathleen A. Leppig, View ORCID ProfilePatrick M. A. Sleiman, Hakon Hakonarson, View ORCID ProfileMarc. S. Williams, Yuan Luo, View ORCID ProfileGail P. Jarvik, Robert C. Green, View ORCID ProfileWendy K. Chung, Ali G. Gharavi, Niall J. Lennon, View ORCID ProfileHeidi L. Rehm, Richard A. Gibbs, Josh F. Peterson, Dan M. Roden, View ORCID ProfileGeorgia L. Wiesner, View ORCID ProfileJoshua C. Denny
doi: https://doi.org/10.1101/2021.03.18.21253763
Chenjie Zeng
1Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Chenjie Zeng
Lisa A. Bastarache
2Center for Precision Medicine, Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Lisa A. Bastarache
Ran Tao
3Department of Biostatistics, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Eric Venner
4Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Eric Venner
Scott Hebbring
5Marshfield Clinic Research Institute, Marshfield, WI
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Justin D. Andujar
1Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
6Clinical and Translational Hereditary Cancer Program, Division of Genetic Medicine, Vanderbilt-Ingram Cancer Center, Vanderbilt University, Nashville, TN
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sarah T. Bland
2Center for Precision Medicine, Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
David R. Crosslin
7Department of Biomedical Informatics and Medical Education, University of Washington School of Medicine, Seattle, WA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for David R. Crosslin
Siddharth Pratap
8School of Graduate Studies and Research, Meharry Medical College, Nashville, TN
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Siddharth Pratap
Ayorinde Cooley
9Department of Microbiology, Immunology and Physiology, Meharry Medical College, Nashville, TN
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ayorinde Cooley
Jennifer A. Pacheco
10Feinberg School of Medicine, Northwestern University, Chicago, IL
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jennifer A. Pacheco
Kurt D. Christensen
11PRecisiOn Medicine Translational Research (PROMoTeR) Center, Department of Population Medicine, Harvard Pilgrim Health Care Institute, Boston, MA, USA, Department of Population Medicine, Harvard Medical School, Boston, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kurt D. Christensen
Emma Perez
12Division of Genetics, Department of Medicine, Brigham and Women’s Hospital, Boston, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Carrie L. Blout Zawatsky
12Division of Genetics, Department of Medicine, Brigham and Women’s Hospital, Boston, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Carrie L. Blout Zawatsky
Leora Witkowski
13McGill University Health Centre, Montreal, Quebec
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Leora Witkowski
Hana Zouk
14Laboratory for Molecular Medicine, Partners Healthcare Personalized Medicine, Cambridge, MA Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Hana Zouk
Chunhua Weng
15Department of Biomedical informatics, Columbia University Irving Medical Center, New York, NY
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kathleen A. Leppig
16Genetic Services and Kaiser Permanente Washington Health Research Institute, Kaiser Permanente of Washington, Seattle, WA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kathleen A. Leppig
Patrick M. A. Sleiman
17Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA
18Division of Human Genetics, Departments of Pediatrics, The University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Patrick M. A. Sleiman
Hakon Hakonarson
17Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA
18Division of Human Genetics, Departments of Pediatrics, The University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Marc. S. Williams
19Genomic Medicine Institute, Geisinger, Danville, PA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Marc. S. Williams
Yuan Luo
10Feinberg School of Medicine, Northwestern University, Chicago, IL
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gail P. Jarvik
20Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington, Seattle, WA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Gail P. Jarvik
Robert C. Green
21Brigham and Women’s Hospital, Broad Institute, Ariadne Labs and Harvard Medical School, Boston, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Wendy K. Chung
22Departments of Pediatrics and Medicine, Columbia University, New York, NY
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Wendy K. Chung
Ali G. Gharavi
23Division of Nephrology, Department of Medicine, Columbia University Irving Medical Center, New York, NY
24Center for Precision Medicine and Genomics, Department of Medicine, Columbia University Irving Medical Center, New York, NY
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Niall J. Lennon
25Broad Institute of MIT and Harvard, Cambridge, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Heidi L. Rehm
26Medical & Population Genetics Program and Genomics Platform, Broad Institute of MIT and Harvard Cambridge, Cambridge, MA, USA. Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA. Department of Pathology, Harvard Medical School, Boston, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Heidi L. Rehm
Richard A. Gibbs
4Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Josh F. Peterson
1Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
2Center for Precision Medicine, Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Dan M. Roden
2Center for Precision Medicine, Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
27Division of Cardiovascular Medicine, Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN; Department of Pharmacology, Vanderbilt University, Nashville, TN
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Georgia L. Wiesner
1Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
6Clinical and Translational Hereditary Cancer Program, Division of Genetic Medicine, Vanderbilt-Ingram Cancer Center, Vanderbilt University, Nashville, TN
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Georgia L. Wiesner
Joshua C. Denny
28National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Joshua C. Denny
  • For correspondence: joshua.denny{at}nih.gov
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Knowledge of the clinical spectrum of rare genetic disorders helps in disease management and variant pathogenicity interpretation. Leveraging electronic health record (EHR)-linked genetic testing data from the eMERGE network, we determined the associations between a set of 23 hereditary cancer genes and 3017 phenotypes in 23544 individuals. This phenome-wide association study replicated 45% (184/406) of known gene-phenotype associations (P = 5.1×10−125). Meta-analysis with an independent EHR-derived cohort of 3242 patients confirmed 14 novel associations with phenotypes in the neoplastic, genitourinary, digestive, congenital, metabolic, mental and neurologic categories. Phenotype risk scores (PheRS) based on weighted aggregations of EHR phenotypes accurately predicted variant pathogenicity for at least 50% of pathogenic variants for 8/23 genes. We generated a catalog of PheRS for 7800 variants, including 5217 variants of uncertain significance, to provide empirical evidence of potential pathogenicity. This study highlights the potential of EHR data in genomic medicine.

Competing Interest Statement

Ali G. Gharavi serves as a consultant to Goldfinch Bio and receives research funding from Renal Research Institute.

Clinical Trial

N/A

Funding Statement

Support for the research and personnel was provided by the R01LM010685 grant from the National Library of Medicine and the eMERGE grants. The eMERGE sites were funded through several series of grants from the National Human Genome Research Institute: U01HG8657, U01HG006375, U01HG004610 (Kaiser Permanente Washington/University of Washington); U01HG8685 (Brigham and Womens Hospital); U01HG8672, U01HG006378, U01HG004608 (Vanderbilt University Medical Center); U01HG8666, U01HG006828 (Cincinnati Childrens Hospital Medical Center); U01HG6379, U01HG04599 (Mayo Clinic); U01HG8679, U01HG006382 (Geisinger Clinic); U01HG008680 (Columbia University Health Sciences); U01HG8684, U01HG006830 (Childrens Hospital of Philadelphia); U01HG8673, U01HG006388, U01HG004609 (Northwestern University); U54MD007593, U54MD007586 (Meharry Medical College); U01HG8676 (Partners Healthcare/Broad Institute); U01HG8664 (Baylor College of Medicine); U01HG006389 (Essentia Institute of Rural Health, Marshfield Clinic Research Foundation and Pennsylvania State University); U01HG006380 (Icahn School of Medicine at Mount Sinai); U01HG8701, U01HG006385, U01HG04603 (Vanderbilt University Medical Center serving as the Coordinating Center); eMERGE Genotyping Centers were also funded through U01HG004438 (CIDR) and U01HG004424 (the Broad Institute). Vanderbilt University Medical Centers Synthetic Derivative, Research Derivative and BioVU are supported by institutional funding and by the CTSA grant ULTR000445 from NCATS/NIH. The majority of CJZs work on this project was supported by T32 CA160056 (NCI).

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Ethics committees and their decisions for each site under the Electronic Medical Record and Genomics (eMERGE) Network are as as follows. CHOP was approved by the Committees for the Protection of Human Subjects of the Children's Hospital of Philadelphia; Cincinnati was approved by the Institution Review Board of the Cincinnati Children's Hospital Medical Center; Columbia was approved by the Human Research Protection Office and Institution Review Boards of Columbia University; Geisinger was approved by the Geisinger Institutional Review Board; Harvard was approved by the Partners Human Research Committee; Mayo was approved by the Mayo Clinic Institutional Review Board. Northwestern was approved by the Northwestern University's Institutional Review Board. UWKP was approved by the Kaiser Permanente Washington Research and Humans Subjects Review Office. Vanderbilt was approved by the Vanderbilt University Institutional Review Board. The hereditary cancer registry at Vanderbilt was approved by the Vanderbilt University Institutional Review Board.

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

Genetic and phenotype data of the eMERGEseq cohort are publicly available in the dbGaP repository under phs001616.v1.p1. All summary statistics for significant gene-phenotype associations from the eMERGEseq and the HCR cohorts are provided in the Supplemental Tables S3-6. All summary statistics for associations of PheRS with genetic variants are provided in Supplemental Tables S9-10.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted March 24, 2021.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
A scalable EHR-based approach for phenotype discovery and variant interpretation for hereditary cancer genes
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
A scalable EHR-based approach for phenotype discovery and variant interpretation for hereditary cancer genes
Chenjie Zeng, Lisa A. Bastarache, Ran Tao, Eric Venner, Scott Hebbring, Justin D. Andujar, Sarah T. Bland, David R. Crosslin, Siddharth Pratap, Ayorinde Cooley, Jennifer A. Pacheco, Kurt D. Christensen, Emma Perez, Carrie L. Blout Zawatsky, Leora Witkowski, Hana Zouk, Chunhua Weng, Kathleen A. Leppig, Patrick M. A. Sleiman, Hakon Hakonarson, Marc. S. Williams, Yuan Luo, Gail P. Jarvik, Robert C. Green, Wendy K. Chung, Ali G. Gharavi, Niall J. Lennon, Heidi L. Rehm, Richard A. Gibbs, Josh F. Peterson, Dan M. Roden, Georgia L. Wiesner, Joshua C. Denny
medRxiv 2021.03.18.21253763; doi: https://doi.org/10.1101/2021.03.18.21253763
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
A scalable EHR-based approach for phenotype discovery and variant interpretation for hereditary cancer genes
Chenjie Zeng, Lisa A. Bastarache, Ran Tao, Eric Venner, Scott Hebbring, Justin D. Andujar, Sarah T. Bland, David R. Crosslin, Siddharth Pratap, Ayorinde Cooley, Jennifer A. Pacheco, Kurt D. Christensen, Emma Perez, Carrie L. Blout Zawatsky, Leora Witkowski, Hana Zouk, Chunhua Weng, Kathleen A. Leppig, Patrick M. A. Sleiman, Hakon Hakonarson, Marc. S. Williams, Yuan Luo, Gail P. Jarvik, Robert C. Green, Wendy K. Chung, Ali G. Gharavi, Niall J. Lennon, Heidi L. Rehm, Richard A. Gibbs, Josh F. Peterson, Dan M. Roden, Georgia L. Wiesner, Joshua C. Denny
medRxiv 2021.03.18.21253763; doi: https://doi.org/10.1101/2021.03.18.21253763

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetic and Genomic Medicine
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)