Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Integration of rare large-effect expression variants improves polygenic risk prediction

View ORCID ProfileCraig Smail, Nicole M. Ferraro, View ORCID ProfileMatthew G. Durrant, Abhiram S. Rao, View ORCID ProfileMatthew Aguirre, View ORCID ProfileXin Li, Michael J. Gloudemans, View ORCID ProfileThemistocles L. Assimes, View ORCID ProfileCharles Kooperberg, View ORCID ProfileAlexander P. Reiner, View ORCID ProfileQin Hui, View ORCID ProfileJie Huang, Christopher J. O’Donnell, View ORCID ProfileYan V. Sun, Million Veteran Program, View ORCID ProfileManuel A. Rivas, View ORCID ProfileStephen B. Montgomery
doi: https://doi.org/10.1101/2020.12.02.20242990
Craig Smail
1Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
2Genomic Medicine Center, Children’s Mercy Research Institute and Children’s Mercy Kansas City, Kansas City, MO, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Craig Smail
  • For correspondence: csmail{at}cmh.edu smontgom{at}stanford.edu
Nicole M. Ferraro
1Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Matthew G. Durrant
3Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Matthew G. Durrant
Abhiram S. Rao
4Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
5Department of Bioengineering, Stanford University, Stanford, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Matthew Aguirre
1Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Matthew Aguirre
Xin Li
6CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Xin Li
Michael J. Gloudemans
1Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Themistocles L. Assimes
7Palo Alto VA Health Care System, Palo Alto, CA, USA
8Division of Cardiology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Themistocles L. Assimes
Charles Kooperberg
9Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Charles Kooperberg
Alexander P. Reiner
10Department of Epidemiology, University of Washington, Seattle, WA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Alexander P. Reiner
Qin Hui
11Atlanta VA Health Care System, Decatur, GA, USA
12Department of Epidemiology, Emory University Rollins School of Public Health, Atlanta, GA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Qin Hui
Jie Huang
13Department of Global Health, Peking University School of Public Health, Beijing, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jie Huang
Christopher J. O’Donnell
14Boston VA Health Care System, Boston, MA, USA
15Division of Cardiology, Department of Medicine, Harvard Medical School, Boston, MA, USA
16Division of Cardiology, Department of Medicine, Brigham Women’s Hospital, Boston, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yan V. Sun
11Atlanta VA Health Care System, Decatur, GA, USA
12Department of Epidemiology, Emory University Rollins School of Public Health, Atlanta, GA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yan V. Sun
Manuel A. Rivas
1Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Manuel A. Rivas
Stephen B. Montgomery
3Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
4Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Stephen B. Montgomery
  • For correspondence: csmail{at}cmh.edu smontgom{at}stanford.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Summary

Polygenic risk scores (PRS) aim to quantify the contribution of multiple genetic loci to an individual’s likelihood of a complex trait or disease. However, existing PRS estimate genetic liability using common genetic variants, excluding the impact of rare variants. We identified rare, large-effect variants in individuals with outlier gene expression from the GTEx project and then assessed their impact on PRS predictions in the UK Biobank (UKB). We observed large deviations from the PRS-predicted phenotypes for carriers of multiple outlier rare variants; for example, individuals classified as “low-risk” but in the top 1% of outlier rare variant burden had a 6-fold higher rate of severe obesity. We replicated these findings using data from the NHLBI Trans-Omics for Precision Medicine (TOPMed) biobank and the Million Veteran Program, and demonstrated that PRS across multiple traits will significantly benefit from the inclusion of rare genetic variants.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

CS is supported by NIH grant T32LM012409. NMF is supported by a National Science Foundation Graduate Research Fellowship (grant number DGE 1656518) and a graduate fellowship from the Stanford Center for Computational, Evolutionary and Human Genomics. MGD is supported by a National Science Foundation Graduate Research Fellowship. MA is supported by the National Library of Medicine under training grant T15LM007033. XL is supported by the National Natural Science Foundation of China (grant number 31970554), National Key R&D Program of China (grant number 2019YFC1315804) and Shanghai Municipal Science and Technology Major Project (grant number 2017SHZDZX01). MJG is supported by a Stanford Graduate Fellowship. MAR is partially supported by Stanford University and a National Institute of Health center for Multi- and Trans-ethnic Mapping of Mendelian and Complex Diseases grant (5U01 HG009080) and partially supported by the National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH) under award R01HG010140. SBM is supported by NIH grants U01HG009431, R01HL142015, R01HG008150, R01AG066490 and U01HG009080. The WHI program is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, U.S. Department of Health and Human Services through contracts HHSN268201600018C, HHSN268201600001C, HHSN268201600002C, HHSN268201600003C, and HHSN268201600004C. Molecular data for the Trans-Omics in Precision Medicine (TOPMed) program was supported by the National Heart, Lung and Blood Institute (NHLBI). Core support including centralized genomic read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Core support including phenotype harmonization, data management, sample-identity QC, and general program coordination were provided by the TOPMed Data Coordinating Center (R01HL-120393; U01HL-120393; contract HHSN268201800001I). This research is also supported by funding from the Department of Veterans Affairs Office of Research and Development, Million Veteran Program (MVP) Grant I01-BX003340 and I01-BX003362.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Based on the information provided in Protocol 44532 the Stanford University IRB has determined that the research does not involve human subjects as defined in 45 CFR 46.102(f) or 21 CFR 50.3(g).

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

GTEx (v7) RNA-seq and WGS data is available from dbGaP (dbGaP Accession phs000424.v7.p2) GTEx (v7) eQTL summary statistics were downloaded from the GTEx Portal available at https://gtexportal.org/home/datasets Data from the TOPMed Women's Health Initiative is available from dbGaP (dbGaP Accession phs001237) UK Biobank (UKB) data was obtained under application number 24983 (PI: Dr. Manuel Rivas) UKB Phase 1 GWAS summary statistics were downloaded from the Neale Lab server available at http://www.nealelab.is/uk-biobank Polygenic risk scores (PRS) for body mass index and type-2 diabetes were downloaded from the Cardiovascular Disease Knowledge Portal available at http://kp4cd.org/dataset_downloads/mi Gene annotation data was obtained from GENCODE (version 19) available at https://www.gencodegenes.org/human/release_19.html Allele frequency data was obtained from gnomAD (version r2.0.2) available at https://console.cloud.google.com/storage/browser/gnomad-public/release/2.0.2/ hg19 coordinates were converted to hg38 using the chain file available at http://hgdownload.soe.ucsc.edu/goldenPath/hg19/liftOver/ Custom scripts to conduct all analyses not performed using existing software can be found at https://github.com/csmail/outlier_prs

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted December 11, 2020.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Integration of rare large-effect expression variants improves polygenic risk prediction
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Integration of rare large-effect expression variants improves polygenic risk prediction
Craig Smail, Nicole M. Ferraro, Matthew G. Durrant, Abhiram S. Rao, Matthew Aguirre, Xin Li, Michael J. Gloudemans, Themistocles L. Assimes, Charles Kooperberg, Alexander P. Reiner, Qin Hui, Jie Huang, Christopher J. O’Donnell, Yan V. Sun, Million Veteran Program, Manuel A. Rivas, Stephen B. Montgomery
medRxiv 2020.12.02.20242990; doi: https://doi.org/10.1101/2020.12.02.20242990
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Integration of rare large-effect expression variants improves polygenic risk prediction
Craig Smail, Nicole M. Ferraro, Matthew G. Durrant, Abhiram S. Rao, Matthew Aguirre, Xin Li, Michael J. Gloudemans, Themistocles L. Assimes, Charles Kooperberg, Alexander P. Reiner, Qin Hui, Jie Huang, Christopher J. O’Donnell, Yan V. Sun, Million Veteran Program, Manuel A. Rivas, Stephen B. Montgomery
medRxiv 2020.12.02.20242990; doi: https://doi.org/10.1101/2020.12.02.20242990

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetic and Genomic Medicine
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)