Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Artificial intelligence enables comprehensive genome interpretation and nomination of candidate diagnoses for rare genetic diseases

View ORCID ProfileFrancisco M. De La Vega, Shimul Chowdhury, Barry Moore, Erwin Frise, Jeanette McCarthy, Edgar Javier Hernandez, Terrence Wong, Kiely James, Lucia Guidugli, Pankaj B Agrawal, Casie A Genetti, Catherine A Brownstein, View ORCID ProfileAlan H Beggs, Britt-Sabina Löscher, Andre Franke, Braden Boone, Shawn E. Levy, Katrin Õunap, Sander Pajusalu, View ORCID ProfileMatt Huentelman, Keri Ramsey, Marcus Naymik, View ORCID ProfileVinodh Narayanan, Narayanan Veeraraghavan, Paul Billings, Martin G. Reese, Mark Yandell, Stephen F. Kingsmore
doi: https://doi.org/10.1101/2021.02.09.21251456
Francisco M. De La Vega
1Fabric Genomics Inc., Oakland, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Francisco M. De La Vega
Shimul Chowdhury
2Rady Children’s Institute for Genomic Medicine, San Diego, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Barry Moore
3Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Erwin Frise
1Fabric Genomics Inc., Oakland, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jeanette McCarthy
1Fabric Genomics Inc., Oakland, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Edgar Javier Hernandez
3Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Terrence Wong
2Rady Children’s Institute for Genomic Medicine, San Diego, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kiely James
2Rady Children’s Institute for Genomic Medicine, San Diego, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lucia Guidugli
2Rady Children’s Institute for Genomic Medicine, San Diego, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Pankaj B Agrawal
4Division of Genetics and Genomics, The Manton Center for Orphan Disease Research, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
5Division of Newborn Medicine, Boston Children’s Hospital, Boston, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Casie A Genetti
4Division of Genetics and Genomics, The Manton Center for Orphan Disease Research, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Catherine A Brownstein
4Division of Genetics and Genomics, The Manton Center for Orphan Disease Research, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alan H Beggs
4Division of Genetics and Genomics, The Manton Center for Orphan Disease Research, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Alan H Beggs
Britt-Sabina Löscher
6Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel & University Hospital Schleswig-Holstein, Kiel, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andre Franke
6Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel & University Hospital Schleswig-Holstein, Kiel, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Braden Boone
7HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shawn E. Levy
7HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Katrin Õunap
8Department of Clinical Genetics, United Laboratories, Tartu University Hospital, & Department of Clinical Genetics, Institute of Clinical Medicine, University of Tartu, Estonia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sander Pajusalu
8Department of Clinical Genetics, United Laboratories, Tartu University Hospital, & Department of Clinical Genetics, Institute of Clinical Medicine, University of Tartu, Estonia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Matt Huentelman
9Center for Rare Childhood Disorders, Translational Genomics Research Institute, Phoenix, Arizona, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Matt Huentelman
Keri Ramsey
9Center for Rare Childhood Disorders, Translational Genomics Research Institute, Phoenix, Arizona, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Marcus Naymik
9Center for Rare Childhood Disorders, Translational Genomics Research Institute, Phoenix, Arizona, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Vinodh Narayanan
9Center for Rare Childhood Disorders, Translational Genomics Research Institute, Phoenix, Arizona, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Vinodh Narayanan
Narayanan Veeraraghavan
2Rady Children’s Institute for Genomic Medicine, San Diego, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Paul Billings
1Fabric Genomics Inc., Oakland, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Martin G. Reese
1Fabric Genomics Inc., Oakland, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: myandell{at}genetics.utah.edu mreese{at}fabricgenomics.com
Mark Yandell
1Fabric Genomics Inc., Oakland, CA, USA
3Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: myandell{at}genetics.utah.edu mreese{at}fabricgenomics.com
Stephen F. Kingsmore
2Rady Children’s Institute for Genomic Medicine, San Diego, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Background Clinical interpretation of genetic variants in the context of the patient’s phenotype is becoming the largest component of cost and time expenditure for genome-based diagnosis of rare genetic diseases. Artificial intelligence (AI) holds promise to greatly simplify and speed interpretation by comprehensively evaluating genetic variants for pathogenicity in the context of the growing knowledge of genetic disease. We assess the diagnostic performance of GEM, a new, AI-based, clinical decision support tool, compared with expert manual interpretation.

Methods We benchmarked GEM in a retrospective cohort of 119 probands, mostly NICU infants, diagnosed with rare genetic diseases, who received whole genome sequencing (WGS) at Rady Children’s Hospital. We also performed a replication study in a separate cohort of 60 cases diagnosed at five additional academic medical centers. For comparison, we also analyzed these cases with commonly used variant prioritization tools (Phevor, Exomiser, and VAAST). Included in the comparisons were WGS and whole exome sequencing (WES) as trios, duos, and singletons. Variants underpinning diagnoses spanned diverse modes of inheritance and types, including structural variants (SVs). Patient phenotypes were extracted either manually or by automated clinical natural language processing (CNLP) from clinical notes. Finally, 14 previously unsolved cases were re-analyzed.

Results GEM ranked >90% of causal genes among the top or second candidate, using manually curated or CNLP derived phenotypes, and prioritized a median of 3 genes for review per case. Ranking of trios and duos was unchanged when analyzed as singletons. In 17 of 20 cases with diagnostic SVs, GEM identified the causal SVs as the top or second candidate irrespective of whether SV calls where provided or inferred ab initio by GEM when absent. Analysis of 14 previously unsolved cases provided novel findings in one, candidates ultimately not advanced in 3, and no new findings in 10, demonstrating the utility of GEM for reanalysis.

Conclusions GEM enables automated diagnostic interpretation of WES and WGS for all types of variants, including SVs, nominating a very short list of candidate genes and disorders for final review and reporting. In combination with deep phenotyping by CNLP, GEM enables substantial automation of genetic disease diagnosis, potentially decreasing the cost and speeding case review.

Competing Interest Statement

FV, EF, JM, MGR were employees of Fabric Genomics Inc. during the performance of this work. EF, FV, JM, MGR, and MY are stock holders, or have received stock option awards from Fabric Genomics Inc. BM and MY have received consulting fees from Fabric Genomics Inc. The other authors declare no competing interests.

Funding Statement

MH, KR, MN and VN were supported in part by The Center for Rare Childhood Disorders, funded through donations made to the TGen Foundation. AF and BSL were supported by the DFG Cluster of Excellence "Precision Medicine in Chronic Inflammation". KO and SP were supported by Estonian Research Council grants PUT355, PRG471, MOBTP175 and PUTJD827. Sequencing and analysis were partially provided by the Broad Institute of MIT and Broad Center for Mendelian Genomics (Broad CMG) and was funded by the National Human Genome Research Institute, the National Eye Institute, and the National Heart, Lung and Blood Institute grant UM1 HG008900 and in part by National Human Genome Research Institute grant R01 HG009141. The phenotyping and analysis of patients at Boston Children's Hospital was funded by MDA602235 from the Muscular Dystrophy Association, and the Tommy Fuss Foundation, and the Yale Center for Mendelian Genomics. Sanger sequencing confirmations utilized the resources of the Boston Children's Hospital IDDRC Molecular Genetics Core Facility supported by U54HD090255 from the National Institutes of health. NIH Public Access Policy: Any publication that received direct funding from an NIH grant (UM1 HG008900 - Joint Center for Mendelian Genomics) is required to be entered into PubMed Central within a year of publication date. As it takes a while to go through the NIHMS system, it is suggested that the submission process begin as soon as possible, and you may be marked as non-compliant if the submission process is not initiated soon after publication. You can determine the submission method for your journal via this website. If you have any questions or need any assistance with this process, please contact us at cmg{at}broadinstitute.org. The NIH Public Access Policy implements Division F Section 217 of PL 111-8 (Omnibus Appropriations Act, 2009). The law states: The Director of the National Institutes of Health (NIH) shall require in the current fiscal year and thereafter that all investigators funded by the NIH submit or have submitted for them to the National Library of Medicine's PubMed Central an electronic version of their final, peer-reviewed manuscripts upon acceptance for publication, to be made publicly available no later than 12 months after the official date of publication: Provided, that the NIH shall implement the public access policy in a manner consistent with copyright law.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Rady Children's Hospital: The studies from which cases derived from were approved by the institutional review board at Rady Children's Hospital, San Diego, CA, USA. The studies were designated to be of "nonsignificant risk" by the FDA in response to an investigational device exemption presubmission inquiry in April 2014. The studies were performed in accordance with the Declaration of Helsinki. Informed consent was obtained from at least one parent or guardian. Patients were consented to clinical research studies approved by the Institutional Review Board. Boston Childrens Hospital: This study was approved by the Institutional Review Board of Boston Children's Hospital. Christian-Albrechts University of Kiel: Proper informed consent and ethical approvals were obtained on all patients included in this analysis. HudsonAlpha Institute for Biotechnology: Proper informed consent and ethical approvals were obtained on all patients included in this analysis. Translational Genomics Research Institute: Participants were consented and enrolled in the WIRB Protocol #20120789. Tartu University Hospital: This study was approved by the Research Ethics Committee of the University of Tartu (approval date 17/10/2016 and number 263/M‐16).

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

The datasets supporting the conclusions of this article are included within the article and its additional files. Due to patient privacy, data sharing consent, and HIPAA regulations, our raw data cannot be submitted to publicly available databases. GEM, PHEVOR and VAAST software implementations for versions used in this analysis are part of the Fabric Enterprise analysis platform and is commercially available (https://www.fabricgenomics.com). Exomiser source code (version 12.1.0) is available on GitHub at: https://github.com/exomiser/Exomiser.

https://github.com/exomiser/Exomiser.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted February 12, 2021.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Artificial intelligence enables comprehensive genome interpretation and nomination of candidate diagnoses for rare genetic diseases
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Artificial intelligence enables comprehensive genome interpretation and nomination of candidate diagnoses for rare genetic diseases
Francisco M. De La Vega, Shimul Chowdhury, Barry Moore, Erwin Frise, Jeanette McCarthy, Edgar Javier Hernandez, Terrence Wong, Kiely James, Lucia Guidugli, Pankaj B Agrawal, Casie A Genetti, Catherine A Brownstein, Alan H Beggs, Britt-Sabina Löscher, Andre Franke, Braden Boone, Shawn E. Levy, Katrin Õunap, Sander Pajusalu, Matt Huentelman, Keri Ramsey, Marcus Naymik, Vinodh Narayanan, Narayanan Veeraraghavan, Paul Billings, Martin G. Reese, Mark Yandell, Stephen F. Kingsmore
medRxiv 2021.02.09.21251456; doi: https://doi.org/10.1101/2021.02.09.21251456
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Artificial intelligence enables comprehensive genome interpretation and nomination of candidate diagnoses for rare genetic diseases
Francisco M. De La Vega, Shimul Chowdhury, Barry Moore, Erwin Frise, Jeanette McCarthy, Edgar Javier Hernandez, Terrence Wong, Kiely James, Lucia Guidugli, Pankaj B Agrawal, Casie A Genetti, Catherine A Brownstein, Alan H Beggs, Britt-Sabina Löscher, Andre Franke, Braden Boone, Shawn E. Levy, Katrin Õunap, Sander Pajusalu, Matt Huentelman, Keri Ramsey, Marcus Naymik, Vinodh Narayanan, Narayanan Veeraraghavan, Paul Billings, Martin G. Reese, Mark Yandell, Stephen F. Kingsmore
medRxiv 2021.02.09.21251456; doi: https://doi.org/10.1101/2021.02.09.21251456

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetic and Genomic Medicine
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)