Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Constructing germline research cohorts from the discarded reads of clinical tumor sequences

Alexander Gusev, Stefan Groha, Kodi Taraszka, Yevgeniy R. Semenov, Noah Zaitlen
doi: https://doi.org/10.1101/2021.04.09.21255197
Alexander Gusev
1Division of Population Sciences, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA
2Division of Genetics, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
3The Broad Institute of MIT & Harvard, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: alexander_gusev{at}dfci.harvard.edu
Stefan Groha
1Division of Population Sciences, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA
3The Broad Institute of MIT & Harvard, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kodi Taraszka
4Departments of Neurology and Computational Medicine, University of California Los Angeles, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yevgeniy R. Semenov
5Department of Dermatology, Massachusetts General Hospital, Boston, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Noah Zaitlen
4Departments of Neurology and Computational Medicine, University of California Los Angeles, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

ABSTRACT

Background Hundreds of thousands of cancer patients have had targeted (panel) tumor sequencing to identify clinically meaningful mutations. In addition to improving patient outcomes, this activity has led to significant discoveries in basic and translational domains. However, the targeted nature of clinical tumor sequencing has a limited scope, especially for germline genetics. In this work, we assess the utility of discarded, off-target reads from tumor-only panel sequencing for recovery of genome-wide germline genotypes through imputation.

Methods We develop a framework for inference of germline variants from tumor panel sequencing, including imputation, quality control, inference of genetic ancestry, germline polygenic risk scores, and HLA alleles. We benchmark our framework on 833 individuals with tumor sequencing and matched germline SNP array data. We then apply our approach to a prospectively collected panel sequencing cohort of 25,889 tumors.

Results We demonstrate high to moderate accuracy of each inferred feature relative to direct germline SNP array genotyping: individual common variants were imputed with a mean accuracy (correlation) of 0.86; genetic ancestry was inferred with a correlation of >0.98; polygenic risk scores were inferred with a correlation of >0.90; and individual HLA alleles were inferred with correlation of >0.89. We demonstrate a minimal influence on accuracy of somatic copy number alterations and other tumor features. We showcase the feasibility and utility of our framework by analyzing 25,889 tumors and identifying relationships between genetic ancestry, polygenic risk, and tumor characteristics that could not be studied with conventional data.

Conclusions We conclude that targeted tumor sequencing can be leveraged to build rich germline research cohorts from existing data, and make our analysis pipeline publicly available to facilitate this effort.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

N.Z. and K.T. were supported by NIH grants K25HL121295, U01HG009080, R01HG006399, R01CA227237, R01ES029929, R01HG011345, the DoD grant W81XWH-16-2-0018, and the Chan Zuckerberg Science Initiative. A.G. and S.G. were supported by R01CA227237, R01CA244569, and the Doris Duke Charitable Foundation. A.G. was supported by the Louis B. Mayer Foundation and the Claudia Adams Barr Foundation.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

PROFILE samples were selected and sequenced from patients who were consented under institutional review board (IRB) approved protocol 11-104 and 17-000 from the Dana-Farber/Partners Cancer Care Office for the Protection of Research Subjects. Written informed consent was obtained from participants prior to inclusion in this study. Secondary analyses of previously collected data were performed with approval from the Dana-Farber IRB (DFCI IRB protocol 19-033 and 19-025; waiver of HIPAA authorization approved for both protocols).

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

The raw sequencing data are not publicly available because the research participant consent, privacy policy, and terms of service do not include authorization to share identifiable data. The full analysis workflow is available at: https://github.com/gusevlab/panel-imp A containerized version of the imputation pipeline is available at: https://hub.docker.com/r/stefangroha/stitch_gcs

https://github.com/gusevlab/panel-imp

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
Back to top
PreviousNext
Posted April 13, 2021.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Constructing germline research cohorts from the discarded reads of clinical tumor sequences
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Constructing germline research cohorts from the discarded reads of clinical tumor sequences
Alexander Gusev, Stefan Groha, Kodi Taraszka, Yevgeniy R. Semenov, Noah Zaitlen
medRxiv 2021.04.09.21255197; doi: https://doi.org/10.1101/2021.04.09.21255197
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Constructing germline research cohorts from the discarded reads of clinical tumor sequences
Alexander Gusev, Stefan Groha, Kodi Taraszka, Yevgeniy R. Semenov, Noah Zaitlen
medRxiv 2021.04.09.21255197; doi: https://doi.org/10.1101/2021.04.09.21255197

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetic and Genomic Medicine
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)