Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Evaluation of polygenic scoring methods in five biobanks reveals greater variability between biobanks than between methods and highlights benefits of ensemble learning

View ORCID ProfileRemo Monti, Lisa Eick, View ORCID ProfileGeorgi Hudjashov, Kristi Läll, View ORCID ProfileStavroula Kanoni, View ORCID ProfileBrooke N. Wolford, View ORCID ProfileBenjamin Wingfield, View ORCID ProfileOliver Pain, View ORCID ProfileSophie Wharrie, View ORCID ProfileBradley Jermy, View ORCID ProfileAoife McMahon, View ORCID ProfileTuomo Hartonen, View ORCID ProfileHenrike Heyne, View ORCID ProfileNina Mars, Genes & Health Research Team, Kristian Hveem, View ORCID ProfileMichael Inouye, View ORCID ProfileDavid A. van Heel, View ORCID ProfileReedik Mägi, View ORCID ProfilePekka Marttinen, View ORCID ProfileSamuli Ripatti, View ORCID ProfileAndrea Ganna, View ORCID ProfileChristoph Lippert
doi: https://doi.org/10.1101/2023.11.20.23298215
Remo Monti
1Hasso Plattner Institute, University of Potsdam, Digital Engineering Faculty, Potsdam, Germany
2Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association, Berlin Institute for Medical Systems Biology, Berlin, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Remo Monti
  • For correspondence: remomomonti{at}gmail.com
Lisa Eick
3Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki,Helsinki, Finland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Georgi Hudjashov
4Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Georgi Hudjashov
Kristi Läll
4Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Stavroula Kanoni
5William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Stavroula Kanoni
Brooke N. Wolford
6K. G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Faculty of Medicine and Health, Norwegian University of Science and Technology, Trondheim, Norway
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Brooke N. Wolford
Benjamin Wingfield
7European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Benjamin Wingfield
Oliver Pain
8Maurice Wohl Clinical Neuroscience Institute, Department of Basic and Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Oliver Pain
Sophie Wharrie
9Aalto University, Department of Computer Science, Espoo, Finland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sophie Wharrie
Bradley Jermy
3Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki,Helsinki, Finland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Bradley Jermy
Aoife McMahon
7European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Aoife McMahon
Tuomo Hartonen
3Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki,Helsinki, Finland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Tuomo Hartonen
Henrike Heyne
1Hasso Plattner Institute, University of Potsdam, Digital Engineering Faculty, Potsdam, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Henrike Heyne
Nina Mars
3Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki,Helsinki, Finland
10Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
11Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nina Mars
Kristian Hveem
6K. G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Faculty of Medicine and Health, Norwegian University of Science and Technology, Trondheim, Norway
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael Inouye
12Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
13Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
14British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
15Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
16British Heart Foundation Cambridge Centre of Research Excellence, School of Clinical Medicine, University of Cambridge, Cambridge, UK
17Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
18Blizard Institute, Queen Mary University of London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Michael Inouye
David A. van Heel
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for David A. van Heel
Reedik Mägi
4Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Reedik Mägi
Pekka Marttinen
9Aalto University, Department of Computer Science, Espoo, Finland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Pekka Marttinen
Samuli Ripatti
3Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki,Helsinki, Finland
19Department of Public Health, University of Helsinki, Helsinki, Finland
20Massachusetts General Hospital and Broad Institute of MIT and Harvard, Cambridge,MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Samuli Ripatti
Andrea Ganna
3Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki,Helsinki, Finland
20Massachusetts General Hospital and Broad Institute of MIT and Harvard, Cambridge,MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Andrea Ganna
Christoph Lippert
1Hasso Plattner Institute, University of Potsdam, Digital Engineering Faculty, Potsdam, Germany
21Windreich Dept. of Artificial Intelligence & Human Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
22Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA
23Department of Diagnostic, Molecular, and Interventional Radiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Christoph Lippert
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Methods to estimate polygenic scores (PGS) from genome-wide association studies are increasingly utilized. However, independent method evaluation is lacking, and method comparisons are often limited. Here, we evaluate polygenic scores derived using seven methods in five biobank studies (totaling about 1.2 million participants) across 16 diseases and quantitative traits, building on a reference-standardized framework. We conducted meta-analyses to quantify the effects of method choice, hyperparameter tuning, method ensembling and target biobank on PGS performance. We found that no single method consistently outperformed all others. PGS effect sizes were more variable between biobanks than between methods within biobanks when methods were well-tuned. Differences between methods were largest for the two investigated autoimmune diseases, seropositive rheumatoid arthritis and type 1 diabetes. For most methods, cross-validation was more reliable for tuning hyperparameters than automatic tuning (without the use of target data). For a given target phenotype, elastic net models combining PGS across methods (ensemble PGS) tuned in the UK Biobank provided consistent, high, and cross-biobank transferable performance, increasing PGS effect sizes (β-coefficients) by a median of 5.0% relative to LDpred2 and MegaPRS (the two best performing single methods when tuned with cross-validation). Our interactively browsable online-results (https://methodscomparison.intervenegeneticscores.org/) and open-source workflow prspipe (https://github.com/intervene-EU-H2020/prspipe) provide a rich resource and reference for the analysis of polygenic scoring methods across biobanks.

Competing Interest Statement

M.I. is a trustee of the Public Health Genomics (PHG) Foundation, a member of the Scientific Advisory Board of Open Targets, and has a research collaboration with AstraZeneca PLC which is unrelated to this study. O.P. provides consultancy services for UCB pharma company.

Funding Statement

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 101016775, the Hasso Plattner Foundation (HPF) and EMBL-EBI Core Funds. M.I. is supported by core funding from the British Heart Foundation (RG/18/13/33946) and NIHR Cambridge Biomedical Research Centre (BRC-1215-20014; NIHR203312). Genes & Health is/has recently been core-funded by Wellcome (WT102627, WT210561), the Medical Research Council (UK) (M009017, MR/X009777/1, MR/X009920/1), Higher Education Funding Council for England Catalyst, Barts Charity (845/1796), Health Data Research UK (for London substantive site), and research delivery support from the NHS National Institute for Health Research Clinical Research Network (North Thames). Genes & Health is/has recently been funded by Alnylam Pharmaceuticals, Genomics PLC; and a Life Sciences Industry Consortium of Astra Zeneca PLC, Bristol-Myers Squibb Company, GlaxoSmithKline Research and Development Limited, Maze Therapeutics Inc, Merck Sharp & Dohme LLC, Novo Nordisk A/S, Pfizer Inc, Takeda Development Centre Americas Inc.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

This research has been conducted using the UK Biobank Resource under Application Number 78537. Ethics approval for the UK Biobank study was obtained from the North West Centre for Research Ethics Committee (11/NW/0382). The genotyping in Trondelag Health Study (HUNT) and work presented here was approved by the Regional Committee for Ethics in Medical Research, Central Norway (2014/144, 2018/1622, 2018/411492). The activities of the Estonian Biobank are regulated by the Human Genes Research Act, which was adopted in 2000 specifically for the operations of the Estonian Biobank. Individual level data analysis in the Estonia Biobank was carried out under ethical approval 1.1-12/624 from the Estonian Committee on Bioethics and Human Research (Estonian Ministry of Social Affairs), using data according to release application S22, document number 6-7/GI/16259 from the Estonian Biobank. Patients and control subjects in FinnGen provided informed consent for biobank research, based on the Finnish Biobank Act. Alternatively, separate research cohorts, collected prior the Finnish Biobank Act came into effect (in September 2013) and start of FinnGen (August 2017), were collected based on study-specific consents and later transferred to the Finnish biobanks after approval by Fimea (Finnish Medicines Agency), the National Supervisory Authority for Welfare and Health. Recruitment protocols followed the biobank protocols approved by Fimea. The Coordinating Ethics Committee of the Hospital District of Helsinki and Uusimaa (HUS) statement number for the FinnGen study is Nr HUS/990/2017. Ethics approval for Genes & Health was obtained from the London South East Research Ethics Committee (IRAS 146051).

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

The prspipe workflow used to generate polygenic score weights, perform polygenic scoring and ancestry matching is available on GitHub (https://github.com/intervene-EU-H2020/prspipe). Genotypes and linked healthcare records held in biobanks are controlled access data and are not publicly available. An application must be made to each biobank to gain access to the data. The 1000 genomes processed genotype data (HapMap3-1KG) are available on figshare (10.6084/m9.figshare.20802700). Non-sensitive experimental data exported from the biobanks are permissively licensed and deposited in an open data repository (https://zenodo.org/doi/10.5281/zenodo.10012995). Processed summary statistics are permissively licensed and hosted on GitHub and accessible through in an R data package (https://github.com/intervene-EU-H2020/pgsCompaR). A website containing an interactive results browser is permissively licensed and available on GitHub (https://github.com/intervene-EU-H2020/pgs-method-compare) hosted at https://methodscomparison.intervenegeneticscores.org/. Polygenic score weight files have been deposited in the PGS catalog under publication ID PGP000517 (https://www.pgscatalog.org/publication/PGP000517/).

https://github.com/intervene-EU-H2020/prspipe

https://github.com/intervene-EU-H2020/pgsCompaR

https://methodscomparison.intervenegeneticscores.org/

https://www.pgscatalog.org/publication/PGP000517/

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted November 20, 2023.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Evaluation of polygenic scoring methods in five biobanks reveals greater variability between biobanks than between methods and highlights benefits of ensemble learning
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Evaluation of polygenic scoring methods in five biobanks reveals greater variability between biobanks than between methods and highlights benefits of ensemble learning
Remo Monti, Lisa Eick, Georgi Hudjashov, Kristi Läll, Stavroula Kanoni, Brooke N. Wolford, Benjamin Wingfield, Oliver Pain, Sophie Wharrie, Bradley Jermy, Aoife McMahon, Tuomo Hartonen, Henrike Heyne, Nina Mars, Genes & Health Research Team, Kristian Hveem, Michael Inouye, David A. van Heel, Reedik Mägi, Pekka Marttinen, Samuli Ripatti, Andrea Ganna, Christoph Lippert
medRxiv 2023.11.20.23298215; doi: https://doi.org/10.1101/2023.11.20.23298215
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Evaluation of polygenic scoring methods in five biobanks reveals greater variability between biobanks than between methods and highlights benefits of ensemble learning
Remo Monti, Lisa Eick, Georgi Hudjashov, Kristi Läll, Stavroula Kanoni, Brooke N. Wolford, Benjamin Wingfield, Oliver Pain, Sophie Wharrie, Bradley Jermy, Aoife McMahon, Tuomo Hartonen, Henrike Heyne, Nina Mars, Genes & Health Research Team, Kristian Hveem, Michael Inouye, David A. van Heel, Reedik Mägi, Pekka Marttinen, Samuli Ripatti, Andrea Ganna, Christoph Lippert
medRxiv 2023.11.20.23298215; doi: https://doi.org/10.1101/2023.11.20.23298215

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetic and Genomic Medicine
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)