Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Closing the gap: Solving complex medically relevant genes at scale

View ORCID ProfileMedhat Mahmoud, John Harting, Holly Corbitt, Xiao Chen, Shalini N. Jhangiani, Harsha Doddapaneni, Qingchang Meng, Tina Han, Christine Lambert, Siyuan Zhang, Primo Baybayan, Geoff Henno, Hua Shen, Jianhong Hu, Yi Han, Casey Riegler, View ORCID ProfileGinger Metcalf, Geoff Henno, Ivan K. Chinn, View ORCID ProfileMichael A. Eberle, Sarah Kingan, Tim Farinholt, Claudia M.B. Carvalho, Richard A. Gibbs, View ORCID ProfileZev Kronenberg, Donna Muzny, View ORCID ProfileFritz J. Sedlazeck
doi: https://doi.org/10.1101/2024.03.14.24304179
Medhat Mahmoud
1Baylor College of Medicine, Human Genome Sequencing Center, Houston, Texas, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Medhat Mahmoud
John Harting
2Pacific Biosciences, Menlo Park, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Holly Corbitt
3Twist Bioscience, South San Francisco, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Xiao Chen
2Pacific Biosciences, Menlo Park, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shalini N. Jhangiani
1Baylor College of Medicine, Human Genome Sequencing Center, Houston, Texas, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Harsha Doddapaneni
1Baylor College of Medicine, Human Genome Sequencing Center, Houston, Texas, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Qingchang Meng
1Baylor College of Medicine, Human Genome Sequencing Center, Houston, Texas, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tina Han
3Twist Bioscience, South San Francisco, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Christine Lambert
2Pacific Biosciences, Menlo Park, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Siyuan Zhang
2Pacific Biosciences, Menlo Park, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Primo Baybayan
2Pacific Biosciences, Menlo Park, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Geoff Henno
2Pacific Biosciences, Menlo Park, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hua Shen
1Baylor College of Medicine, Human Genome Sequencing Center, Houston, Texas, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jianhong Hu
1Baylor College of Medicine, Human Genome Sequencing Center, Houston, Texas, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yi Han
1Baylor College of Medicine, Human Genome Sequencing Center, Houston, Texas, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Casey Riegler
3Twist Bioscience, South San Francisco, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ginger Metcalf
1Baylor College of Medicine, Human Genome Sequencing Center, Houston, Texas, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ginger Metcalf
Geoff Henno
2Pacific Biosciences, Menlo Park, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ivan K. Chinn
4Department of Pediatrics, Section of Immunology Allergy and Rheumatology, Center for Human Immunobiology, Texas Children’s Hospital and Baylor College of Medicine, Houston, Texas, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael A. Eberle
2Pacific Biosciences, Menlo Park, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Michael A. Eberle
Sarah Kingan
2Pacific Biosciences, Menlo Park, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tim Farinholt
3Twist Bioscience, South San Francisco, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Claudia M.B. Carvalho
5Pacific Northwest Research Institute, Seattle, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Richard A. Gibbs
1Baylor College of Medicine, Human Genome Sequencing Center, Houston, Texas, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zev Kronenberg
2Pacific Biosciences, Menlo Park, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Zev Kronenberg
Donna Muzny
1Baylor College of Medicine, Human Genome Sequencing Center, Houston, Texas, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Fritz J. Sedlazeck
1Baylor College of Medicine, Human Genome Sequencing Center, Houston, Texas, USA
6Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
7Department of Computer Science, Rice University, Houston, Texas, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Fritz J. Sedlazeck
  • For correspondence: fritz.sedlazeck{at}bcm.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Comprehending the mechanism behind human diseases with an established heritable component represents the forefront of personalized medicine. Nevertheless, numerous medically important genes are inaccurately represented in short-read sequencing data analysis due to their complexity and repetitiveness or the so-called ‘dark regions’ of the human genome. The advent of PacBio as a long-read platform has provided new insights, yet HiFi whole-genome sequencing (WGS) cost remains frequently prohibitive. We introduce a targeted sequencing and analysis framework, Twist Alliance Dark Genes Panel (TADGP), designed to offer phased variants across 389 medically important yet complex autosomal genes. We highlight TADGP accuracy across eleven control samples and compare it to WGS. This demonstrates that TADGP achieves variant calling accuracy comparable to HiFi-WGS data, but at a fraction of the cost. Thus, enabling scalability and broad applicability for studying rare diseases or complementing previously sequenced samples to gain insights into these complex genes.

TADGP revealed several candidate variants across all cases and provided insight into LPA diversity when tested on samples from rare disease and cardiovascular disease cohorts. In both cohorts, we identified novel variants affecting individual disease-associated genes (e.g., IKZF1, KCNE1). Nevertheless, the annotation of the variants across these 389 medically important genes remains challenging due to their underrepresentation in ClinVar and gnomAD. Consequently, we also offer an annotation resource to enhance the evaluation and prioritization of these variants.

Overall, we can demonstrate that TADGP offers a cost-efficient and scalable approach to routinely assess the dark regions of the human genome with clinical relevance.

Competing Interest Statement

FS received support from Illumina, Genentech, PacBio, and Oxford Nanopore Technologies. JH, XC, CL, SZ, PB, ME, SK, GH, and ZK are PacBio employees and shareholders. HC, TH, CR, and TF are employees of Twist.

Funding Statement

FJS and MM are supported by NIH grants (1U01HG011758-01). SIH S10 grant 1S10OD028587. We would like to thank Luis F. Paulin for his fruitful discussion on the de novo SVs calling. We would like to thank Barbara Bock for her help on gene annotations.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Ethics committee/IRB of Baylor College of Medicine (BCM) gave ethical approval for the rare disease patients under the protocol H-29697. Regarding the cardiovascular samples, the study was approved by the Baylor College of Medicine (BCM) Institutional Review Boards (protocol number: H-43884). Written informed consent was obtained from all study participants. All sample names have been deidentified.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data availability

Supplementary Table S7 lists the raw reads were downloaded from SRA The GIAB benchmark is downloadable here:

HG001:

SNVs and Indels: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/latest/GRCh38/

HG002:

SNVs and Indels: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/CMRG_v1.00/GRCh38/SmallVariant/

SVs: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/CMRG_v1.00/GRCh38/StructuralVariant/

HG003:

SNVs: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG003_NA24149_father/latest/GRCh38/

HG004:

SNVs: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG004_NA24143_mother/latest/GRCh38/

HG005:

SNVs: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/ChineseTrio/HG005_NA24631_son/latest/GRCh38/

HG006:

SNVs: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/ChineseTrio/HG006_NA24694_father/latest/GRCh38/

HG007:

SNVs: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/ChineseTrio/HG007_NA24695_mother/latest/GRCh38/

ClinVar VCF file is available here: https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/archive_2.0/2023/clinvar_20230107.vcf.gz

Variant calling VCF files are available at: https://doi.org/10.5281/zenodo.10806570

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
Posted March 18, 2024.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Closing the gap: Solving complex medically relevant genes at scale
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Closing the gap: Solving complex medically relevant genes at scale
Medhat Mahmoud, John Harting, Holly Corbitt, Xiao Chen, Shalini N. Jhangiani, Harsha Doddapaneni, Qingchang Meng, Tina Han, Christine Lambert, Siyuan Zhang, Primo Baybayan, Geoff Henno, Hua Shen, Jianhong Hu, Yi Han, Casey Riegler, Ginger Metcalf, Geoff Henno, Ivan K. Chinn, Michael A. Eberle, Sarah Kingan, Tim Farinholt, Claudia M.B. Carvalho, Richard A. Gibbs, Zev Kronenberg, Donna Muzny, Fritz J. Sedlazeck
medRxiv 2024.03.14.24304179; doi: https://doi.org/10.1101/2024.03.14.24304179
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Closing the gap: Solving complex medically relevant genes at scale
Medhat Mahmoud, John Harting, Holly Corbitt, Xiao Chen, Shalini N. Jhangiani, Harsha Doddapaneni, Qingchang Meng, Tina Han, Christine Lambert, Siyuan Zhang, Primo Baybayan, Geoff Henno, Hua Shen, Jianhong Hu, Yi Han, Casey Riegler, Ginger Metcalf, Geoff Henno, Ivan K. Chinn, Michael A. Eberle, Sarah Kingan, Tim Farinholt, Claudia M.B. Carvalho, Richard A. Gibbs, Zev Kronenberg, Donna Muzny, Fritz J. Sedlazeck
medRxiv 2024.03.14.24304179; doi: https://doi.org/10.1101/2024.03.14.24304179

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetic and Genomic Medicine
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)