Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Improving reporting standards for polygenic scores in risk prediction studies

View ORCID ProfileHannah Wand, View ORCID ProfileSamuel A. Lambert, Cecelia Tamburro, Michael A. Iacocca, Jack W. O’Sullivan, Catherine Sillari, Iftikhar J. Kullo, Robb Rowley, Jacqueline S. Dron, Deanna Brockman, Eric Venner, Mark I. McCarthy, Antonis C. Antoniou, Douglas F. Easton, View ORCID ProfileRobert A. Hegele, View ORCID ProfileAmit V. Khera, Nilanjan Chatterjee, Charles Kooperberg, Karen Edwards, Katherine Vlessis, Kim Kinnear, John N. Danesh, Helen Parkinson, Erin M. Ramos, Megan C. Roberts, Kelly E. Ormond, Muin J. Khoury, A. Cecile J.W. Janssens, View ORCID ProfileKatrina A.B. Goddard, Peter Kraft, Jaqueline A. L. MacArthur, View ORCID ProfileMichael Inouye, View ORCID ProfileGenevieve Wojcik
doi: https://doi.org/10.1101/2020.04.23.20077099
Hannah Wand
1Stanford University School of Medicine, Stanford, CA, USA
2Stanford Center for Inherited Cardiovascular Disease, Stanford, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Hannah Wand
Samuel A. Lambert
3Cambridge Baker Systems Genomic Initiative, University of Cambridge, Cambridge, UK
4Cambridge Baker Systems Genomic Initiative, Baker Heart and Diabetes Institute, Melbourne, AUS
5BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
6Health Data Research UK-Cambridge, Wellcome Genome Campus, Hinxton, UK
7European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Samuel A. Lambert
Cecelia Tamburro
8National Human Genome Research Institute, Bethesda, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael A. Iacocca
1Stanford University School of Medicine, Stanford, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jack W. O’Sullivan
1Stanford University School of Medicine, Stanford, CA, USA
2Stanford Center for Inherited Cardiovascular Disease, Stanford, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Catherine Sillari
8National Human Genome Research Institute, Bethesda, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Iftikhar J. Kullo
9Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Robb Rowley
8National Human Genome Research Institute, Bethesda, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jacqueline S. Dron
10Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
11Western University, London, ON, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Deanna Brockman
10Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Eric Venner
12Baylor College of Medicine, Houston, TX, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mark I. McCarthy
13Department of Human Genetics, Genentech, South San Francisco, CA, USA
14Wellcome Centre for Human Genetics, Oxford, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Antonis C. Antoniou
15University of Cambridge, Cambridge, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Douglas F. Easton
15University of Cambridge, Cambridge, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Robert A. Hegele
11Western University, London, ON, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Robert A. Hegele
Amit V. Khera
10Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Amit V. Khera
Nilanjan Chatterjee
16Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
17Department of Oncology, Johns Hopkins School of Medicine, Baltimore, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Charles Kooperberg
18Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Karen Edwards
19Department of Epidemiology, University of California, Irvine, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Katherine Vlessis
20Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kim Kinnear
20Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
John N. Danesh
5BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
6Health Data Research UK-Cambridge, Wellcome Genome Campus, Hinxton, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Helen Parkinson
6Health Data Research UK-Cambridge, Wellcome Genome Campus, Hinxton, UK
7European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Erin M. Ramos
8National Human Genome Research Institute, Bethesda, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Megan C. Roberts
22Division of Pharmaceutical Outcomes and Policy, UNC Eshelman School of Pharmacy, Chapel Hill, NC, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kelly E. Ormond
20Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
23Stanford Center for Biomedical Ethics, Stanford University School of Medicine, Stanford, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Muin J. Khoury
24Centers for Disease Control and Prevention, Atlanta, GA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
A. Cecile J.W. Janssens
25Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Katrina A.B. Goddard
26Department of Translational and Applied Genomics, Kaiser Permanente Northwest, Portland, OR, USA
27Center for Health Research, Kaiser Permanente Northwest, Portland, OR, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Katrina A.B. Goddard
Peter Kraft
28Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jaqueline A. L. MacArthur
7European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael Inouye
3Cambridge Baker Systems Genomic Initiative, University of Cambridge, Cambridge, UK
4Cambridge Baker Systems Genomic Initiative, Baker Heart and Diabetes Institute, Melbourne, AUS
5BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
6Health Data Research UK-Cambridge, Wellcome Genome Campus, Hinxton, UK
21National Institute for Health Research Cambridge Biomedical Research Centre, University of Cambridge and Cambridge University Hospitals, Cambridge, UK
29The Alan Turing Institute, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Michael Inouye
Genevieve Wojcik
30Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Genevieve Wojcik
  • For correspondence: gwojcik1{at}jhu.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Polygenic risk scores (PRS), often aggregating the results from genome-wide association studies, can bridge the gap between the initial discovery efforts and clinical applications for disease risk estimation. However, there is remarkable heterogeneity in the reporting of these risk scores. This lack of adherence to reporting standards hinders the translation of PRS into clinical care. The ClinGen Complex Disease Working Group, in a collaboration with the Polygenic Score (PGS) Catalog, have updated the Genetic Risk Prediction (GRIPS) Reporting Statement to the current state of the field and to enable downstream utility. Drawing upon experts in epidemiology, statistics, disease-specific applications, implementation, and policy, this 22-item reporting framework defines the minimal information needed to interpret and evaluate a PRS, especially with respect to any downstream clinical applications. Items span detailed descriptions of the study population (recruitment method, key demographic and clinical characteristics, inclusion/exclusion criteria, and outcome definition), statistical methods for both PRS development and validation, and considerations for potential limitations of the published risk score and downstream clinical utility. Additionally, emphasis has been placed on data availability and transparency to facilitate reproducibility and benchmarking against other PRS, such as deposition in the publicly available PGS Catalog. By providing these criteria in a structured format that builds upon existing standards and ontologies, the use of this framework in publishing PRS will facilitate translation of PRS into clinical care and progress towards defining best practices.

Summary In recent years, polygenic risk scores (PRS) have increasingly been used to capture the genome-wide liability underlying many human traits and diseases, hoping to better inform an individual’s genetic risk. However, a lack of adherence to existing reporting standards has hindered the translation of this important tool into clinical and public health practice; in particular, details necessary for benchmarking and reproducibility are underreported. To address this gap, the ClinGen Complex Disease Working Group and Polygenic Score (PGS) Catalog have updated the Genetic Risk Prediction (GRIPS) Reporting Statement into the 22-item Polygenic Risk Score Reporting Statement (PRS-RS). This framework provides the minimal information expected of authors to promote the validity, transparency, and reproducibility of PRS by encouraging authors to detail the study population, statistical methods, and potential clinical utility of a published score. The widespread adoption of this framework will encourage rigorous methodological consideration and facilitate benchmarking to ensure high quality scores are translated into the clinic.

Main Text

The predisposition to common diseases and traits arises from a complex interaction between genetic and nongenetic factors. In the past decade there has been enormous success at discovering the landscape of disease-associated genetic variants as a result of the many collaborative consortia and large cohorts of well-phenotyped individuals and matched genetic information.1–5 In particular, genome-wide association studies (GWAS) have emerged as a powerful approach to identify disease- or trait-associated genetic variants and yielded summary statistics that describe the magnitude (effect size) and statistical significance of association between an allele and the outcome of interest.4,6 GWAS have been applied to a wide range of complex human traits and diseases, including height, blood pressure, cardiovascular disease, cancer, obesity, and Alzheimer’s disease.

The associations identified via GWAS can be combined to quantify genetic predisposition to a heritable trait, and this information can be used to conduct disease risk stratification or predict prognostic outcomes and response to therapy.7,8 Typically, information across many variants is used to form a weighted sum of allele counts, where the weights reflect the relative magnitude of association between variant alleles and the trait or disease. These weighted sums can include up to millions of variants, and are frequently referred to as polygenic risk score(s) (PRS), or genetic or genomic risk score(s) (GRS), if they refer to risk estimates of disease outcomes; or, more generally, polygenic score(s) (PGS) when referring to any outcome (see Box 1 for further discussion of nomenclature). While there is active development of algorithms to decide how many and which variants to include and how to weigh them so as to maximize the proportion of variance explained or disease discrimination, there is an emerging consensus that the inclusion of variants beyond those meeting stringent GWAS significance levels can boost predictive performance.9,10 Methodological research has also established theoretical limits of PGS/PRS performance based on the trait’s genetic architecture and heritability.11–15

In the last decade, the landscape of genetic prediction studies has transformed. There have been over 900 publications mentioning PGS/PRS with significant developments in how PGS are constructed and evaluated, as well as many new proposed uses. The data available in the current era of biomedical research is larger and more consolidated than ever before. Biobanks and large-scale consortia have become dominant, yet frequently researchers have limited access to individual-level data. Since individual data is unavailable, most PRS risk predictions are developed from summary-level data (e.g. GWAS summary statistics) in secondary datasets, each of which come with their own specific methodological considerations.16–18 At the same time, there has been a push towards open data sharing as outlined in the FAIR (Findable, Accessible, Interoperable and Reusable) Data Principles 3,19, with an emphasis on ensuring that research is reproducible by all.

Box 1.

Definitions of relevant genetic risk prediction terms

Polygenic Score(s) (PGS)

a single value that quantifies an individual’s genetic predisposition to a trait. Typically calculated by summing the number of trait-associated alleles in an individual weighted by per-allele effect sizes from a discovery GWAS, and normalized using a relevant population distribution. Sometimes referred to as a genetic score.

Polygenic Risk Score(s) (PRS)

a subset of PGS which is used to estimate risk of disease or other clinically relevant outcomes (binary or discrete). Sometimes referred to as a genetic or genomic risk score (GRS). See categories of PRS below.

Integrated Risk Model

a risk model combining PRS with other established risk factors for the outcome of interest, such as demographics (often age and sex), anthropometrics, biomarkers, and clinical measurements to estimate a specific disease risk.

Categories of use for PRS and/or integrated risk models

The addition of PRS to existing risk models has several potential applications, summarized below. In each, the aim of PRS integration is to improve individual or subgroup classification to the extent that there is meaningful clinical benefit.

Disease Risk Prediction – used to estimate an individual’s risk of developing a disease, based on the presence of certain genetic and/or clinical variables.

Disease Diagnosis – used to classify whether an individual has a disease, or a disease subtype, linked to a certain etiology based on the presence of certain genetic and/or clinical variables.9,20

Disease Prognosis – used to estimate the risk of further adverse outcome(s) subsequent to diagnosis of disease.21

Therapeutic – used to predict a patient or subgroup’s response to a particular treatment.22

The capacity of PRS to quantify genetic predisposition for many clinically relevant traits and diseases has begun to be established, with multiple potential clinical uses in settings related to disease risk stratification as well as proposed prognostic uses (e.g., predicting response to intervention/treatment). The readiness of PRS for implementation varies among outcomes, with only a few diseases, like coronary heart disease (CHD) and breast cancer, having mature PRS with potential clinical utility (Boxes 2 and 3, respectively). There has also been a rapid rise of direct-to-consumer assays and for-profit companies (23andMe, Color, MyHeritage, etc.) providing PGS/PRS results to customers outside of the traditional patient-provider framework. These concomitant advances have resulted in healthcare systems developing new infrastructures to deliver genetic risk information.23 These advances, individually and combined, have raised significant challenges for PRS reporting standards; from the very basic (e.g. reporting performance metrics on an external validation dataset) to the complicated (e.g. making the raw variant and weight information for a PRS available), necessitating the updating of existing standard for reporting genetic risk prediction studies to convey the increased scope of PRS and complexity for their clinical applications.

Box 2:

Current CHD PRS and their potential uses

Many PRS have been developed for CHD, varying in the computational methods used, the number of variants included (50–6,000,000), and the GWAS and cohorts used for PRS training. For example, the latest and currently most predictive CHD PRS use GWAS summary statistics from the CardiogramPlusC4D study 24, and mainly differ by the computational methods used to select and weight individual variants (including LDpred 25,26, lassosum 27, and meta-scoring approaches 28, and how they are combined into risk models with conventional risk factors. These PRS may provide useful information for predicting risk of CHD that is largely orthogonal to conventional risk factors (age, sex, hypertension, cholesterol, BMI, diabetes, smoking) as well as family history. Clinical applications may include:

  • Improved prediction of risk for future adverse cardiovascular events when added to conventional risk models (such as the Framingham Risk Score 29, Pooled Cohort Equations 27,28, QRISK 27).

  • Reclassification of risk categories often leading to recommendations related to risk-reducing treatments like statins. 29–31

While the data strongly suggest CHD PRS, by refining risk estimates, may improve patient outcomes, clinical utility through randomized clinical trials has yet to be conclusively established. We anticipate this is the future direction of PRS studies, and a number of clinical trials are underway. 32

Box 3:

Current breast cancer PRS and their potential uses

Many of the most recent and most predictive PRS for breast cancer include a smaller number of variants (usually in the 100-1,000s), possibly owing to a less polygenic architecture and more low-frequency variants having greater effect; however, scores composed of millions of variants also exist. Most of the latest PRS have been developed using GWAS summary statistics and data from the Breast Cancer Association Consortium (BCAC), using variants passing genome-wide significance (lead SNPs) or methods including stepwise or penalized regressions on individual-level genotypes 33, Pruning + Thresholding 25, and Ldpred 26. In contrast to CHD, genetics is commonly used to measure and understand breast cancer risk vis-a-vis BRCA1/2 mutation testing; however, routine screening for breast cancer is often performed in older women using mammography and non-genetic risk prediction tools already exist. As such there has been active research into the value of capturing the polygenic risk of common variation on breast cancer, with multiple potential clinical uses and considerations:

  • Multiple PRS exist to predict risk for specific subtypes of breast cancer (e.g. ER positive/negative, luminal, and triple negative33,34), which could be used to stratify patients for more beneficial treatments, and may have prognostic implications.

  • PRS can be combined with existing non-genetic risk prediction models (combining risk factors such as age, family history, mammographic density, hormone replacement therapy) and improve the prediction of incident breast cancer risk. 35–42

  • Breast cancer has been linked to several rare high-or moderate-risk penetrant genes (e.g. BRCA1, BRCA2, PALB2, CHEK2, ATM) and screening for pathogenic variants in these genes is part of clinical practice.43 It has been shown that PRS can provide important stratification of risk among carriers of such pathogenic variants and thus could be further useful for clinical decision making. 26,40,44–48

Indeed, the BOADICEA breast cancer risk prediction model includes the effects of common variants (PRS313 33) as well as of other rare pathogenic genetic variants 40 and has been implemented in the CanRisk Tool (www.canrisk.org) that has been approved for use by healthcare professionals in the European Economic Area. PRS information has been studied via simulation49 and is being tested in risk-based breast cancer screening trials in the US 50 and Europe (MyPeBS; https://mypebs.eu).

Poorly designed or described studies call into question the validity of some PRS to predict their target outcome 51,52, and relatively few studies have benchmarked multiple scores performance externally. At present, there are no uniformly agreed best practices for developing PRS, nor widely adopted standards or regulations sufficiently tailored to assess the eventual clinical readiness of a PRS. There are rapidly emerging applications of PRS that further compound the heterogeneity in reporting, e.g. using PRS as tools for testing gene x environment interactions or shared etiology between diseases. 53–56 This in itself is not an issue, but the rapid evolution in both the methodological development and applications of PRS make it challenging to compare or reproduce claims about the predictive performance of a PRS for a specific outcome when studies are not properly documented. These deficiencies are barriers to PRS being interpreted, compared, and reproduced, and must be addressed to enable the application of PRS to improve clinical practice and public health.

Frameworks have been developed to establish standards around the transparent, standardized, accurate, complete, and meaningful reporting of scientific studies. In 2011, an international working group published the Genetic Risk Prediction Studies (GRIPS) Statement, a reporting guideline for the study of risk prediction models that include genetic variants, from genetic mutations to gene scores.57 This guideline is analogous to guidelines developed for observational epidemiological studies (STROBE58) and genome-wide association studies (STREGA59), and is in line with the reporting guideline for multivariate prediction models (TRIPOD 60). Adherence to reporting statements has been low, and the same holds for GRIPS. One of the reasons might be that researchers feel that GRIPS inadequately addresses PRS, where there has been rapid evolution in score development methodology in recent years. Researchers are frequently uncertain as to what precisely should be reported for a PRS study to be assessed as rigorous, reproducible, and ultimately translatable, especially with the increased push for data availability and transparency. Most PRS studies follow a prototypical process to PRS development evaluation (Figure 1) that can be used as a template for standardizing reporting and benchmarking in the field.

Figure 1:
  • Download figure
  • Open in new tab
Figure 1: Prototype of PRS development and validation process.

Figure 1 displays prototypical steps for PRS construction, risk model development, and validation of performance with select aspects of the PRS-RS guideline labeled in bold text throughout. In PRS development, variants associated with an outcome of interest, typically identified from a GWAS, are combined as a weighted sum of allele counts across variants. Methods for optimizing variant selection for a PRS (PRS construction & estimation) are not shown. The PRS is tested in a risk model predicting the outcome of interest and may be combined with other non-genetic variables (e.g. age, sex, ancestry, clinical variables); collectively these are referred to as risk model variables. After fitting procedures to select the best risk model, this model is validated in an independent sample. The PRS distribution should be described, and the performance of the risk model demonstrated in terms of its discrimination, predictive ability, and calibration. Though not displayed in the figure, these same results should also be reported for the training sample for comparison to the validation sample. In both training and validation cohorts, the outcome of interest criteria, demographics, genotyping, and non-genetic variables should be reported (Table 1).

Here, the Clinical Genome Resource (ClinGen) Complex Disease Working Group and the Polygenic Score (PGS) Catalog (Supplemental Note 1) jointly present the Polygenic Risk Score Reporting Standards (PRS-RS), a synthesis of an expanded reporting standard for PRS that addresses current research environments with advanced methodological developments to inform clinically meaningful reporting on PRS development and validation in the literature with an emphasis on reproducibility and transparency throughout the development process. Additional methods are detailed in Supplemental Note 2.

View this table:
  • View inline
  • View popup
Table 1.

Polygenic Risk Score Reporting Statement (PRS-RS)

The ClinGen-PGS Catalog Polygenic Risk Score Reporting Standards (PRS-RS)

The PRS-RS is a set of standard items specifying the minimal criteria that need to be described in a manuscript in order to accurately interpret a PRS and reproduce results throughout the PRS development process, briefly illustrated in Figure 1. 61 It applies to both PRS development and validation studies that aim to predict disease onset and prognosis, as well as response to therapies; however, other research uses of PRS have overlapping steps that should be reported similarly. Table 1 presents the full PRS-RS, in which criteria are organized into key components along the developmental pipeline of PRS for clear interpretation and to encourage their documentation from the inception of the study well before publication.

Reporting on risk score background

The development and validation of a PRS tests a specific hypothesis with a defined outcome and study population. Therefore, authors should define a priori the study type (e.g. development and/or validation), purpose (e.g. risk prediction vs. prognosis) and predicted end outcome (e.g. CHD) in enough detail to understand why the study population and risk model selected are relevant (e.g. the value for CHD risk stratification and primary prevention is highest in younger individuals compared to those over 80 who have accumulated risk over a lifetime). As the PRS-RS is focused on clinical validity and implementation, authors must outline the study and appropriate outcomes to understand what risk is measured, what the purpose of measuring risk would be, and why this purpose may be of clinical relevance. In order to establish the internal validity of a study, authors should use the appropriate data needed to address the intended purpose (e.g. prediction of incident disease vs. prognosis), with adequate documentation of dataset characteristics to understand nuances in measured risk.

Reporting on study populations

The applicability of any risk prediction to an external target population (the “who, where, and when”) depends on its similarity to the original study populations used to derive the risk model. Therefore, authors need to define and characterize the details of their study population (e.g. study design and recruitment), and describe participant demographics for key variables (such as age and sex) and ancestry. Importantly, there are often inconsistent definitions and levels of detail associated with ancestry, and the transferability of genetic findings between different racial/ethnic groups can be limited. 1,9,62 It is therefore essential for authors to provide a detailed description of participants’ genetic ancestry - including how ancestry was determined – using a common controlled vocabulary where possible (e.g. tables 1 and 2 of the standardized framework developed by the NHGRI-EBI GWAS Catalog1). While age, sex and ancestry are the most universally relevant characteristics, authors should provide a sufficient level of detailed criteria for defining all the relevant factors used in the risk model (non-genetic variable(s)) and the outcome of interest. This is particularly important if they are included in the final risk model and should accompany information about how the population was genotyped (genetic data).

Reporting on risk model development

There are currently several methods that are commonly used to select variants and fine-tune weights that constitute the PRS. 7,16–18,61 For methods using individual-level genomic data, the original source study should be cited or the assays and quality control described in detail. Methods using GWAS summary statistics should clearly cite the relevant GWAS(s) (preferably using unique and persistent study identifiers from the GWAS Catalog (GCSTs; 63). As the performance and limitations of the combined risk model are dependent on methodological considerations, authors must provide complete details including the method used and how variants are selected and combined into a single PRS (PRS construction & estimation). Apart from the genetic data in the model, authors should also describe the defining criteria for other demographic and non-genetic predictors (non-genetic variables) included in the model. Often authors will iterate through numerous models to find the optimal fit. In addition to the estimation methods, it is important to detail the integrated risk model fitting procedure, including the measures used for final model selection. Translating the continuous PRS distribution to a risk estimate, whether absolute or relative, is highly dependent on assumptions and limitations inherent to the specific data set utilized. When describing the risk model type, authors should detail the time scale employed for prediction or the study period/follow-up time for a relative hazard model. Additionally, if relative risk is estimated, the reference group should be well described. These details should be described for the training set, as well as validation and sub-group analyses.

Reporting on model parameters

Authors should report estimates for all evaluated models, not only the methods behind decision-making, equipping readers with the information necessary to gauge the relative value of an increase in performance against other trade-offs (data transparency and availability). The underlying PRS (variant alleles and derived weights) should be made publicly available, preferably through direct submission to an indexed repository such as the PGS Catalog, to enable others to reuse existing models (with known validity) and facilitate direct benchmarking between different PRS for the same trait. The current mathematical form of most PRS—a linear combination of allele counts—facilitates clear model description and reproducibility. Future genomic risk models may have more complex forms, e.g. allowing for explicit non-linear epistatic and gene-environment interactions, or deep neural networks of lesser clarity. It will be important to describe these models in sufficient detail to allow their implementation and evaluation by other researchers and clinical groups.

Reporting on risk model evaluation

We recommend that authors provide summary information of the risk score distribution to aid in model interpretation. The risk model’s predictive ability, calibration, and discrimination should also be assessed and described with common descriptions including the risk score effect size, variance explained (R2), reclassification indices, and metrics like sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). The risk model calibration and discrimination should be described for all analyses, although their estimation and interpretation are most relevant for the PRS validation sample. It is imperative for the PRS and expanded risk models to be evaluated on a population that is external (e.g. independent, non-overlapping) to the individuals in the study population. The ability of the risk model to classify individuals of interest (risk model discrimination) can commonly described and presented in terms of the AUROC, AUPRC or C-index. Any differences in variable definitions or performance discrepancies between the training and validation sets should be described.

Reporting on interpretation

By explicitly describing the risk model’s interpretation and outlining potential limitations to the generalizability of their model, authors will empower readers and the wider community to better understand the risk score and its relative merits. Authors should justify the clinical relevance and risk model intended uses, such as how the performance of their PRS compares to other commonly used risk models, or previously published PRS. This may also include comparisons to other genetic predictors of disease (e.g. mutations in high/moderate risk genes associated with Mendelian forms of the disease), family history, simple demographic models, or conventional risk calculators (see Box 2 and 3 for disease-specific examples). What indicates a “good” prediction can differ between outcomes and intended uses, but should be reported with similar metrics to what is described in the evaluation section.

Supplemental Note 3 provides reporting considerations in addition to the minimal reporting framework in Table 1. Authors intending downstream clinical implementation should aim for the level of transparent and comprehensive reporting covered in both Table 1 and Supplemental Notes, especially those related to discussing the interpretation, limitations, and generalizability of results. The proper reporting of PRS development and performance can also have implications for seeking regulatory approval of the PRS as a clinical test. Though not a comprehensive list of regulatory requirements, we highlight aspects of PRS-RS that would be considered evidence of analytical and clinical validity from the College of American Pathologists (CAP) and the Clinical Laboratory Improvement Amendments (CLIA) perspective (Supplemental Table 1). CAP and CLIA approvals are additional incentives for reporting adherence of researchers wishing to translate their work, as well as a caution for researchers wishing to avoid unintended uses of their findings. Lastly, we reiterate the need for both methodological and data transparency and encourage deposition of PRS (variant-level information necessary to recalculate the genetic portion of the score) in the PGS Catalog (www.PGSCatalog.org; 64), which provides an invaluable resource for widespread adoption and distribution of a published PRS. The PGS Catalog provides access to PGS and related metadata to support the FAIR principles of data stewardship 19, enabling subsequent applications and assessments of PGS performance and best practices (see Supplemental Table 2 for a description of the metadata captured in the Catalog and it’s overlap with the PRS-RS).

Using the PRS-RS and PGS Catalog to improve PRS research and translation: transparency and open data

We surveyed 30 publications (selecting for a diversity of disease domains, risk score categories, and populations) to understand how the information in the PRS-RS is presented and displayed as part of the larger iterative process to clarify and improve field definitions. For 10 of these publications, we provide detailed annotations using the final field definitions (Supplemental Table 3) and use these annotations to illustrate the detail necessary for each PRS-RS item (further described in Supplemental Note 3). The heterogeneity in the PRS reporting we observed in this pilot highlights a series of challenges. Critical aspects of PRS studies, including ancestry, predictive ability, and transparency/availability of information needed to reproduce PRS, were frequently absent or reported in insufficient detail. This underscores the need for PRS-RS to clearly and specifically define meaningful aspects of PRS development, testing, and intended clinical use. However, these deficits in reporting are not unique to PRS; previous reports of underreporting have found that 77% of GWAS publications in 2017 did not share summary statistics 65 and 4% of GWAS do not report any relevant ancestry information 1. In line with the push towards a culture of reproducibility and open data in genomics, we as the ClinGen Complex Disease Working Group and PGS Catalog joined to create a set of reporting standards (Table 1) specifically tailored to PRS research based on multidisciplinary and international expert opinion for tailoring previous standards.

Researchers using PRS-RS may identify fringe cases that are inadequately captured by these reporting items, as we have modeled our guideline on prototypical steps for PRS development (Figure 1) for flexibility. While we anticipate the field may further change as novel methods and technologies are generated, the PRS-RS items can be further expanded and adapted to encompass novel considerations. By updating previous standards, drawing upon current leaders in the field, and tailoring the framework to common barriers observed in recent literature, we aim to provide a comprehensive and pragmatic perspective on the topic. In line with previous standards, PRS-RS includes elements related to understanding the clinical validity of PRS and consequent risk models. Items such as predicted end outcome and intended use bookend our guideline with the intended clinical framing of PRS reporting. In addition, we have modeled the guideline by steps in experimental design, from hypothesis to interpretation, to more clearly emphasize the significance of the intended use case in defining what needs reported and inform documentation throughout the process. As a reference, we have also included a guide to where PRS-RS items should be reported in a manuscript in Supplemental Table 4). These expansions will further facilitate the curation and expert annotation of published PRS as we move towards widespread clinical use.

While the scope of our work encompasses clinical validity, it does not address the additional requirements needed to establish the clinical or public health utility of a PRS, such as randomized trials with clinically meaningful outcomes, health economic evaluations, or feasibility studies. 66 In addition, the translation of structured data elements into useful clinical parameters may not be direct. A relevant example is that the disease case definitions utilized in training or validation in any particular PRS study may deviate (sometimes substantially) from those utilized in any specific health system, for example CHD symptoms commonly include angina (chest pain), whereas PRS are frequently trained on stricter definitions excluding angina. In addition, the definitions used for race/ancestry as outlined in the PGS and GWAS Catalog1 may also differ from structured terms used to document ancestry information in clinical care, in which case consistent mappings, and potentially parallel analyses, may be necessary to translate from genetically determined ancestries to those routinely used in clinical care. Such translation issues potentially limit generalizability to target populations and warrant further discussion, and we reiterate the need for authors to be mindful of their intended purpose and target audience when discussing their findings. Nevertheless, we have aided authors’ understanding of potential translational barriers by considering current CAP/CLIA analytical and clinical validity evidence requirements of peer-reviewed literature to ensure PRS-RS has value in informing later steps of the clinical translation spectrum, including clinical utility (Supplemental Table 1). Finally, while the principles of this work are clear, its scope does not include the complex commercial restrictions, such as intellectual property, that may be placed on published studies regarding the reporting or distribution of PGS, or the underlying data thereof. We hope this work will inform downstream regulation and transparency standards for PRS as a commercial clinical tool.

The coordinated efforts of the ClinGen Complex Disease Working Group and PGS Catalog provide a set of compatible resources for researchers to deposit PGS/PRS information. The PGS Catalog (www.PGSCatalog.org) provides an informatics platform with data integration and harmonization to other PGS as well as the source GWAS study through its sister platform, the GWAS Catalog. 64 In addition, it provides a structured database of scores (variants and effect weights) that can be reused, along with metadata requested in the PRS-RS. With these tools, PRS-RS can be mandated by leading peer-reviewed journals and, consequently, the quality and rigor of PRS research will be elevated to a level which facilitates clinical implementation. We encourage readers to visit the ClinGen complex disease website (https://clinicalgenome.org/working-groups/complex-disease/) for any future changes or amendments to the reporting guideline.

While we have provided explicit recommendations on how to acknowledge study design limitations and their impact on the interpretation and generalizability of a PRS, future research should attempt to establish best practices to guide the field. Moving forward, supplemental frameworks should be developed for the reporting of new methods, such as deep learning, as well as requirements for clinical utility and readiness. Taken together, PRS-RS facilitates the rapid emergence of polygenic risk scores as potentially powerful tools for the translation of genomic discoveries into clinical and public health benefits, and provides a framework for PRS to transform multiple areas of human genetic research.

Data Availability

N/A

ClinGen is primarily funded by the National Human Genome Research Institute (NHGRI), through the following three grants: U41HG006834, U41HG009649, U41HG009650. ClinGen also receives support for content curation from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), through the following three grants: U24HD093483, U24HD093486, U24HD093487. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Additionally, the views expressed in this article are those of the author(s) and not necessarily those of the NHS, the NIHR, or the Department of Health. Research reported in this publication was supported by the National Human Genome Research Institute of the National Institutes of Health under Award Number U41HG007823 (EBI-NHGRI GWAS Catalog, PGS Catalog). In addition, we acknowledge funding from the European Molecular Biology Laboratory. Individuals were funded from the following sources: MIM was a Wellcome Investigator and an NIHR Senior Investigator with funding from NIDDK (U01-DK105535); Wellcome (090532, 098381, 106130, 203141, 212259). MI, SAL, and JD were supported by core funding from: the UK Medical Research Council (MR/L003120/1), the British Heart Foundation (RG/13/13/30194; RG/18/13/33946) and the National Institute for Health Research (Cambridge Biomedical Research Centre at the Cambridge University Hospitals NHS Foundation Trust). SAL is supported by a Canadian Institutes of Health Research postdoctoral fellowship (MFE-171279). JD holds a British Heart Foundation Personal Chair and a National Institute for Health Research Senior Investigator Award. This work was also supported by Health Data Research UK, which is funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation and Wellcome.

Author Contributions

HW, CT, MAI, IJK, CS, RR, KV, KK, KABG, PK, and GLW conducted the literature review of reporting practices. HW, SAL, CT, MIA, JWO, IJK, RR, DB, EV, MIM, ACA, DFE, RAH, AVK, NC, CK, KE, EMR, MCR, KEO, MJK, ACJWJ, KABG, PK, JALM, MI, and GLW provided feedback on the reporting framework. HW, SAL, CT, MAI, JWO, IJK, AVK, JND, HP, EMR, MR, ACJWJ, KABG, PK, JALM, MI, and GLW wrote and edited the manuscript text and items.

Competing Interests

MIM is on the advisory panels Pfizer, Novo Nordisk, and Zoe Global; Honoraria: Merck, Pfizer, Novo Nordisk, and Eli Lilly; Research funding: Abbvie, Astra Zeneca, Boehringer Ingelheim, Eli Lilly, Janssen, Merck, Novo Nordisk, Pfizer, Roche, Sanofi Aventis, Servier & Takeda. As of June 2019, he is an employee of Genentech with stock and stock options in Roche. No other authors have competing interests to declare.

Acknowledgements

We would like to acknowledge the input of the ClinGen Complex Disease Working Group members, including Carlos D. Bustamante, Michelle Meyer, Frank Harrell, David Kent, Peter Visscher, Tim Assimes, Sharon Plon, and Jonathan Berg. We would also like to thank Diane Durham for her editorial support and Angela Paolucci for her administrative support in the preparation and submission of this manuscript.

Footnotes

  • ↵# Shared first authorship

  • ↵$ Shared senior authorship

  • PRS-RS has been revised and streamlined; author affiliations updated; Supplemental Materials added and updated; Figures updated.

Bibliography

  1. 1.↵
    Morales, J. et al. A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog. Genome Biol. 19, 21 (2018).
    OpenUrl
  2. 2.↵
    MacArthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 45, D896–D901 (2017).
    OpenUrlCrossRefPubMed
  3. 3.↵
    Claussnitzer, M. et al. A brief history of human disease genetics. Nature (2020).
  4. 4.↵
    Visscher, P. M., Brown, M. A., McCarthy, M. I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012).
    OpenUrlCrossRefPubMed
  5. 5.↵
    Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).
    OpenUrlCrossRefPubMed
  6. 6.↵
    Pasaniuc, B. & Price, A. L. Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet. 18, 117–127 (2017).
    OpenUrlCrossRefPubMed
  7. 7.↵
    Lambert, S. A., Abraham, G. & Inouye, M. Towards clinical utility of polygenic risk scores. Hum. Mol. Genet. 28, R133–R142 (2019).
    OpenUrlCrossRefPubMed
  8. 8.↵
    Torkamani, A., Wineinger, N. E. & Topol, E. J. The personal and clinical utility of polygenic risk scores. Nat. Rev. Genet. 19, 581–590 (2018).
    OpenUrlCrossRefPubMed
  9. 9.↵
    Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
    OpenUrlCrossRefPubMed
  10. 10.↵
    Chatterjee, N., Shi, J. & García-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 17, 392–406 (2016).
    OpenUrlCrossRefPubMed
  11. 11.↵
    Wray, N. R., Yang, J., Goddard, M. E. & Visscher, P. M. The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genet. 6, e1000864 (2010).
    OpenUrlCrossRefPubMed
  12. 12.
    Dudbridge, F. Power and predictive accuracy of polygenic risk scores. PLoS Genet. 9, e1003348 (2013).
    OpenUrlCrossRefPubMed
  13. 13.
    Pharoah, P. D. P. et al. Polygenic susceptibility to breast cancer and implications for prevention. Nat. Genet. 31, 33–36 (2002).
    OpenUrlCrossRefPubMedWeb of Science
  14. 14.
    Zhang, Y., Qi, G., Park, J.-H. & Chatterjee, N. Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits. Nat. Genet. 50, 1318–1326 (2018).
    OpenUrlCrossRefPubMed
  15. 15.↵
    Chatterjee, N. et al. Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nat. Genet. 45, 400–5, 405e1 (2013).
    OpenUrlCrossRefPubMed
  16. 16.↵
    Mak, T. S. H., Porsch, R. M., Choi, S. W., Zhou, X. & Sham, P. C. Polygenic scores via penalized regression on summary statistics. Genet. Epidemiol. 41, 469–480 (2017).
    OpenUrl
  17. 17.
    Choi, S. W. & O’Reilly, P. F. PRSice-2: Polygenic Risk Score software for biobank-scale data. Gigascience 8, (2019).
  18. 18.↵
    Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).
    OpenUrlCrossRefPubMed
  19. 19.↵
    Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).
    OpenUrl
  20. 20.↵
    Huo, D. et al. Comparison of breast cancer molecular features and survival by african and european ancestry in the cancer genome atlas. JAMA Oncol. 3, 1654–1662 (2017).
    OpenUrl
  21. 21.↵
    Choi, J. et al. The associations between immunity-related genes and breast cancer prognosis in Korean women. PLoS One 9, e103593 (2014).
    OpenUrl
  22. 22.↵
    Marston, N. A. et al. Predicting benefit from evolocumab therapy in patients with atherosclerotic disease using a genetic risk score: results from the FOURIER trial. Circulation 141, 616–623 (2020).
    OpenUrl
  23. 23.↵
    Ganguly, P. & National Human Genome Research Institute. NIH funds centers to improve the role of genomics in assessing and managing disease risk. (2020). at <https://www.genome.gov/news/news-release/NIH-funds-centers-to-improve-role-of-genomics-in-assessing-and-managing-disease-risk>
  24. 24.↵
    Nikpay, M. et al. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat. Genet. 47, 1121–1130 (2015).
    OpenUrlCrossRefPubMed
  25. 25.↵
    Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).
    OpenUrlCrossRefPubMed
  26. 26.↵
    Mars, N. et al. Polygenic and clinical risk scores and their impact on age at onset and prediction of cardiometabolic diseases and common cancers. Nat. Med. 26, 549–557 (2020).
    OpenUrlPubMed
  27. 27.↵
    Elliott, J. et al. Predictive Accuracy of a Polygenic Risk Score-Enhanced Prediction Model vs a Clinical Risk Score for Coronary Artery Disease. JAMA 323, 636–645 (2020).
    OpenUrlPubMed
  28. 28.↵
    Inouye, M. et al. Genomic risk prediction of coronary artery disease in 480,000 adults: implications for primary prevention. J. Am. Coll. Cardiol. 72, 1883–1893 (2018).
    OpenUrlFREE Full Text
  29. 29.↵
    Abraham, G. et al. Genomic prediction of coronary heart disease. Eur. Heart J. 37, 3267–3278 (2016).
    OpenUrlCrossRefPubMed
  30. 30.
    Ganna, A. et al. Multilocus genetic risk scores for coronary heart disease prediction. Arterioscler. Thromb. Vasc. Biol. 33, 2267–2272 (2013).
    OpenUrlAbstract/FREE Full Text
  31. 31.↵
    Kullo, I. J. et al. Incorporating a Genetic Risk Score Into Coronary Heart Disease Risk Estimates: Effect on Low-Density Lipoprotein Cholesterol Levels (the MI-GENES Clinical Trial). Circulation 133, 1181–1188 (2016).
    OpenUrlAbstract/FREE Full Text
  32. 32.↵
    U.S. National Library of Medicine. Home - ClinicalTrials.gov. at <https://clinicaltrials.gov/>
  33. 33.↵
    Mavaddat, N. et al. Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. Am. J. Hum. Genet. 104, 21–34 (2019).
    OpenUrlCrossRefPubMed
  34. 34.↵
    Zhang, H. et al. Genome-wide association study identifies 32 novel breast cancer susceptibility loci from overall and subtype-specific analyses. Nat. Genet. 52, 572–581 (2020).
    OpenUrl
  35. 35.↵
    Zhang, X. et al. Addition of a polygenic risk score, mammographic density, and endogenous hormones to existing breast cancer risk prediction models: A nested case-control study. PLoS Med. 15, e1002644 (2018).
    OpenUrl
  36. 36.
    Lakeman, I. M. M. et al. Addition of a 161-SNP polygenic risk score to family history-based risk prediction: impact on clinical management in non-BRCA1/2 breast cancer families. J. Med. Genet. 56, 581–589 (2019).
    OpenUrlAbstract/FREE Full Text
  37. 37.
    Wilcox, A. N. et al. Prospective Evaluation of a Breast Cancer Risk Model Integrating Classical Risk Factors and Polygenic Risk in 15 Cohorts from Six Countries. medRxiv (2019). doi:10.1101/19011171
    OpenUrlAbstract/FREE Full Text
  38. 38.
    Pal Choudhury, P. et al. Comparative validation of the BOADICEA and Tyrer-Cuzick breast cancer risk models incorporating classical risk factors and polygenic risk in a population-based prospective cohort. medRxiv (2020). doi:10.1101/2020.04.27.20081265
    OpenUrlAbstract/FREE Full Text
  39. 39.
    Maas, P. et al. Breast cancer risk from modifiable and nonmodifiable risk factors among white women in the united states. JAMA Oncol. 2, 1295–1302 (2016).
    OpenUrl
  40. 40.↵
    Lee, A. et al. BOADICEA: a comprehensive breast cancer risk prediction model incorporating genetic and nongenetic risk factors. Genet. Med. 21, 1708–1718 (2019).
    OpenUrlCrossRefPubMed
  41. 41.
    Garcia-Closas, M., Gunsoy, N. B. & Chatterjee, N. Combined associations of genetic and environmental risk factors: implications for prevention of breast cancer. J. Natl. Cancer Inst. 106, (2014).
  42. 42.↵
    Kapoor, P. M. et al. Combined associations of a polygenic risk score and classical risk factors with breast cancer risk. J. Natl. Cancer Inst. (2020). doi:10.1093/jnci/djaa056
    OpenUrlCrossRef
  43. 43.↵
    Easton, D. F. et al. Gene-panel sequencing and the prediction of breast-cancer risk. N. Engl. J. Med. 372, 2243–2257 (2015).
    OpenUrlCrossRefPubMed
  44. 44.
    Muranen, T. A. et al. Genetic modifiers of CHEK2*1100delC-associated breast cancer risk. Genet. Med. 19, 599–603 (2017).
    OpenUrlPubMed
  45. 45.
    Kuchenbaecker, K. B. et al. Evaluation of polygenic risk scores for breast and ovarian cancer risk prediction in BRCA1 and BRCA2 mutation carriers. J. Natl. Cancer Inst. 109, (2017).
  46. 46.
    Mars, N. et al. Polygenic risk, susceptibility genes, and breast cancer over the life course. medRxiv (2020). doi:10.1101/2020.04.17.20069229
    OpenUrlAbstract/FREE Full Text
  47. 47.
    Fahed, A. C. et al. Polygenic background modifies penetrance of monogenic variants conferring risk for coronary artery disease, breast cancer, or colorectal cancer. medRxiv (2019). doi:10.1101/19013086
    OpenUrlAbstract/FREE Full Text
  48. 48.
    Barnes, D. R. et al. Polygenic risk scores and breast and epithelial ovarian cancer risks for carriers of BRCA1 and BRCA2 pathogenic variants. Genet. Med. (2020). doi:10.1038/s41436-020-0862-x
    OpenUrlCrossRef
  49. 49.↵
    van den Broek, J. J. et al. Personalizing breast cancer screening based on polygenic risk and family history. J. Natl. Cancer Inst. (2020). doi:10.1093/jnci/djaa127
    OpenUrlCrossRef
  50. 50.↵
    Esserman, L. J. & WISDOM Study and Athena Investigators. The WISDOM Study: breaking the deadlock in the breast cancer screening debate. NPJ Breast Cancer 3, 34 (2017).
    OpenUrl
  51. 51.↵
    Martens, F. K., Tonk, E. C. M. & Janssens, A. C. J. W. Evaluation of polygenic risk models using multiple performance measures: a critical assessment of discordant results. Genet. Med. 21, 391–397 (2019).
    OpenUrl
  52. 52.↵
    Janssens, A. C. J. W. Validity of polygenic risk scores: are we measuring what we think we are? Hum. Mol. Genet. 28, R143–R150 (2019).
    OpenUrl
  53. 53.↵
    Warrier, V. et al. Polygenic scores for intelligence, educational attainment and schizophrenia are differentially associated with core autism features, IQ, and adaptive behaviour in autistic individuals. medRxiv (2020). doi:10.1101/2020.07.21.20159228
    OpenUrlAbstract/FREE Full Text
  54. 54.
    Matoba, N., Love, M. I. & Stein, J. L. Evaluating brain structure traits as endophenotypes using polygenicity and discoverability. BioRxiv (2020). doi:10.1101/2020.07.17.208843
    OpenUrlAbstract/FREE Full Text
  55. 55.
    Coleman, J. R. I. et al. Genome-wide gene-environment analyses of major depressive disorder and reported lifetime traumatic experiences in UK Biobank. Mol. Psychiatry 25, 1430–1446 (2020).
    OpenUrl
  56. 56.↵
    Kraft, P. & Aschard, H. Finding the missing gene-environment interactions. Eur J Epidemiol 30, 353–355 (2015).
    OpenUrlCrossRefPubMed
  57. 57.↵
    Janssens, A. C. J. W. et al. Strengthening the reporting of genetic risk prediction studies (GRIPS): explanation and elaboration. Eur. J. Hum. Genet. 19, 18 p preceding 494 (2011).
    OpenUrlCrossRefPubMed
  58. 58.↵
    von Elm, E. et al. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. BMJ 335, 806–808 (2007).
    OpenUrlFREE Full Text
  59. 59.↵
    Little, J. et al. STrengthening the REporting of Genetic Association Studies (STREGA): an extension of the STROBE statement. PLoS Med. 6, e22 (2009).
    OpenUrlCrossRefPubMed
  60. 60.↵
    Moons, K. G. M. et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann. Intern. Med. 162, W1–73 (2015).
    OpenUrlCrossRefPubMed
  61. 61.↵
    Choi, S. W., Mak, T.S.-H. & O’Reilly, P. F. Tutorial: a guide to performing polygenic risk score analyses. Nat. Protoc. 15, 2759–2772 (2020).
    OpenUrlPubMed
  62. 62.↵
    Wojcik, G. L. et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature 570, 514–518 (2019).
    OpenUrlCrossRef
  63. 63.↵
    Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
    OpenUrlCrossRefPubMed
  64. 64.↵
    Lambert, S. A. et al. The Polygenic Score Catalog: an open database for reproducibility and systematic evaluation. medRxiv (2020). doi:10.1101/2020.05.20.20108217
    OpenUrlAbstract/FREE Full Text
  65. 65.↵
    Thelwall, M. et al. Is useful research data usually shared? An investigation of genome-wide association study summary statistics. PLoS One 15, e0229578 (2020).
    OpenUrl
  66. 66.↵
    Burke, W. & Zimmern, R. Moving Beyond ACCE: An Expanded Framework for Genetic Test Evaluation. (PHG Foundation, 2007).
Back to top
PreviousNext
Posted November 24, 2020.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Improving reporting standards for polygenic scores in risk prediction studies
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Improving reporting standards for polygenic scores in risk prediction studies
Hannah Wand, Samuel A. Lambert, Cecelia Tamburro, Michael A. Iacocca, Jack W. O’Sullivan, Catherine Sillari, Iftikhar J. Kullo, Robb Rowley, Jacqueline S. Dron, Deanna Brockman, Eric Venner, Mark I. McCarthy, Antonis C. Antoniou, Douglas F. Easton, Robert A. Hegele, Amit V. Khera, Nilanjan Chatterjee, Charles Kooperberg, Karen Edwards, Katherine Vlessis, Kim Kinnear, John N. Danesh, Helen Parkinson, Erin M. Ramos, Megan C. Roberts, Kelly E. Ormond, Muin J. Khoury, A. Cecile J.W. Janssens, Katrina A.B. Goddard, Peter Kraft, Jaqueline A. L. MacArthur, Michael Inouye, Genevieve Wojcik
medRxiv 2020.04.23.20077099; doi: https://doi.org/10.1101/2020.04.23.20077099
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Improving reporting standards for polygenic scores in risk prediction studies
Hannah Wand, Samuel A. Lambert, Cecelia Tamburro, Michael A. Iacocca, Jack W. O’Sullivan, Catherine Sillari, Iftikhar J. Kullo, Robb Rowley, Jacqueline S. Dron, Deanna Brockman, Eric Venner, Mark I. McCarthy, Antonis C. Antoniou, Douglas F. Easton, Robert A. Hegele, Amit V. Khera, Nilanjan Chatterjee, Charles Kooperberg, Karen Edwards, Katherine Vlessis, Kim Kinnear, John N. Danesh, Helen Parkinson, Erin M. Ramos, Megan C. Roberts, Kelly E. Ormond, Muin J. Khoury, A. Cecile J.W. Janssens, Katrina A.B. Goddard, Peter Kraft, Jaqueline A. L. MacArthur, Michael Inouye, Genevieve Wojcik
medRxiv 2020.04.23.20077099; doi: https://doi.org/10.1101/2020.04.23.20077099

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetic and Genomic Medicine
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)