Instability of high polygenic risk classification and mitigation by integrative scoring
=======================================================================================

* Anika Misra
* Buu Truong
* Sarah M. Urbut
* Yang Sui
* Akl C. Fahed
* Jordan W. Smoller
* Aniruddh P. Patel
* Pradeep Natarajan

## Abstract

Polygenic risk scores (PRS) continue to improve with novel methods and expanding genome-wide association studies. Healthcare and third-party laboratories are increasingly deploying PRS reports to patients. Although new PRS show improving strengths of association with traits, it is unknown how the classification of high polygenic risk changes across individual PRS for the same trait. Here, we determined classification of high genetic risk from all cataloged PRS for three complex traits. While each PRS for each trait demonstrated generally consistent population-level strengths of associations, classification of individuals in the top 10% of each PRS distribution varied widely. Using the PRSMix framework, which incorporates information across several PRS to improve prediction, we generated sequential add-one-in (AOI) PRSMix\_AOI scores based on order of publication. PRSMix\_AOIn led to improved PRS performance and more consistent high-risk classification compared with the PRSn. The PRSMix_AOI approach provides more stable and reliable classification of high-risk as new PRS continue to be generated toward PRS standardization.

Polygenic risk scores (PRS), which quantify the genetic risk for traits from common variants, have improved in their predictive performances over the past decade.1 Building on classical approaches of pruning and thresholding, methods incorporating Bayesian approaches using prior knowledge about genetic architecture, relatedness of individuals, linkage disequilibrium patterns, and genetic effects across populations have improved assignment of variant weights and PRS performance.2–5 In parallel, genome-wide association studies (GWAS) have continued to expand in size, with the most recent iteration of the GWASs for human height and coronary artery disease reaching 5.4 million and 1.3 million participants, respectively.6,7 Incorporation of multi-ancestry and multi-trait GWAS data has further improved PRS prediction in diverse ancestral groups.8 In attempting to quantify total inherited risk, each of these PRS iterations for a given trait captures unique information contingent on source GWAS, method, and training dataset.

PRS are increasingly being delivered to patients. Third-party genetic testing companies and healthcare system laboratories are already delivering polygenic scores for coronary artery disease, diabetes, cancers, and other diseases.9,10 The eMERGE consortium has developed PRS reports for 10 diseases to return to participants within healthcare system as part of a larger effort to study genomic risk assessment and management.11–13 Furthermore, researchers recently developed clinically valid assays, clinical workflows, and patient- and physician-oriented information materials to accompany PRS reports delivered within the Mass General Brigham Biobank and the Veterans Affairs Genomic Medicine at Veterans Affairs (GenoVA) Study.14,15 New clinical trials are incorporating PRS into medical decision-making ([NCT05819814](http://medrxiv.org/lookup/external-ref?link\_type=CLINTRIALGOV&access_num=NCT05819814&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F24%2F2024.07.24.24310897.atom), [NCT05850091](http://medrxiv.org/lookup/external-ref?link_type=CLINTRIALGOV&access_num=NCT05850091&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F24%2F2024.07.24.24310897.atom)), and medical societies have begun to release initial statements on their utility.16,17

Despite ongoing progress toward clinical implementation, the variability of individual-level classification of ‘high genetic risk’ using different PRS for a given trait remains largely untested, and consensus PRS for any trait currently do not exist. Previous efforts have focused on population-level prediction metrics, rather than consistency of high-risk classification presented in individual clinical reports now as a part of clinical implementation workflows.8 Prior limited availability of large holdout diverse datasets has limited individual-level benchmarking to assess agreement in classification between PRS. Furthermore, there is a need to aggregate and incorporate orthogonal data from available PRS while overcoming correlation between scores and maximize predictive performance.

Using the large and ancestrally-diverse *All of Us* (AOU) cohort18, we set out to compare the classification of individuals with high genetic risk based on published polygenic scores for three common, complex diseases: coronary artery disease (CAD), type 2 diabetes (T2DM), and major depressive disorder (MDD). We also test the effect of using PRSMix19 — a tool that agnostically incorporates information across several PRS for a given trait to improve prediction accuracy for a target population — in influencing high genetic risk classification over iterations of polygenic scores.

We determined the associations of published PRS for complex traits from the Polygenic Score Catalog20 in AOU, which has aggregated genotype data and extensive phenotypic information on 236,393 participants (average enrollment age: 51.8 years; 60.6% female; genetically inferred ancestry of 54.7% EUR, 22.9% AFR, 18.9% AMR, 2.3% EAS and 1.1% SAS).21 Specifically, we calculated 57 scores for CAD, 129 scores for T2DM, and 18 scores for MDD. We tested the associations of these scores with corresponding outcomes. In each group of trait-specific scores, we chose the strongest target disease-associated score deposited by each unique publication in the group to form the subset of scores called PRSn, with n referring to the chronological order of the PRSn publication among the group of all trait-specific PRSn chosen – this resulted in 40 CAD PRSn, 39 T2DM PRSn, and 7 MDD PRSn. Then, for each of the three trait-specific groups of chronologically ordered PRSn, we generated a parallel group of sequential add-one-in (AOI) PRSMix_AOIn scores wherein, using PRSMix, each PRSn was combined with all other PRSn in its group published before it to produce its corresponding PRSMix_AOIn score. Finally, we generated three trait-specific PRSMixall scores, each describing the PRSMix of all published PRS for the given trait. PRSMixall was used to establish a benchmark definition and classification of high genetic risk, allowing consistent comparison between individual PRSn and their paired PRSMix_AOIn. (Figure 1)

![Figure 1:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/07/24/2024.07.24.24310897/F1.medium.gif)

[Figure 1:](http://medrxiv.org/content/early/2024/07/24/2024.07.24.24310897/F1)

Figure 1: Overview of analysis
A) Polygenic scores that were not developed using PRSMix and do not include *All of Us* participants in their development were downloaded from the PGS catalog and calculated in the *All of Us* cohort for three traits of interest, including 57 scores for coronary artery disease (CAD), 129 scores for type 2 diabetes (T2DM), and 18 scores for major depressive disorder (MDD). B) PRSMix, a tool that incorporates information across several PRS for a given trait to improve prediction accuracy for a target population, was used to generate a PRSMixall for each trait incorporating all available scores in a training dataset of 189,114 individuals. Being in the top 10% of these trait-specific PRSMixall score distributions was used as the benchmark for classifying high genetic risk. C) For publications that had shared multiple scores in the PGS Catalog for a trait, the score with the strongest association with the trait of interest was identified and chronologically ordered as PRSn. D) PRSMix was used to generate a set of chronological add-one-in (AOI) PRSMix_AOIn scores corresponding to each set of PRSn, wherein we combined each PRSn with all prior published PRSn scores for the same trait. These PRSn, corresponding PRSMix_AOIn, and the single PRSMixall scores for each trait were calculated and further analyzed in a holdout population of 47,279 participants in *All of Us*.

First, we calculated the associations and classification stability of high polygenic risk, defined as the top 10% of the score distribution, with the corresponding trait. Using contemporary PRS for common diseases, defining a threshold of top 10% as high genetic risk is associated with approximately two-fold greater risk of disease in prior studies.14 The ORs associated with top 10% risk classification ranged from 1.25-2.50 for all CAD PRSn with CAD, 1.30-2.58 for all T2DM PRSn with T2DM, and 0.99-1.47 for all MDD PRSn with MDD.(Extended Figure 1) While each PRSn demonstrated consistent strengths of association, classification of individuals in the top 10% of each PRS distribution varied widely. The Jaccard index, a metric of similarity, is the proportion of observations in agreement between two sets of data relative to the total number of observations. The median [interquartile range] Jaccard index for top 10% classification by CAD PRSn was 0.17 [0.13-0.22], indicating poor agreement across scores. Similarly, the median Jaccard index for the top 10% classification was 0.18 [0.15-0.22] for T2DM PRSn and 0.11 [0.08-0.17] for MDD PRSn (Figure 2). This trend was invariant of threshold choice. (Supplementary Table 1) When restricting to polygenic scores from the five most recent publications, we continued to observe poor agreement of high-risk classifications, with median [IQR] Jaccard index estimates of 0.15 [0.13-0.22] for CAD PRSn, 0.26 [0.18-0.35] for T2DM PRSn, and 0.18 [0.07-0.23] for MDD PRSn.

![Extended Figure 1:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/07/24/2024.07.24.24310897/F7.medium.gif)

[Extended Figure 1:](http://medrxiv.org/content/early/2024/07/24/2024.07.24.24310897/F7)

Extended Figure 1: Odds ratios for high genetic risk using PRSn vs. PRSMix_AOIn with three traits
Odds ratios are based on logistic regression models predicting disease of interest, with variables of high genetic risk, age, sex, and first ten principal components of genetic ancestry. High genetic risk is defined as the top 10% of each respective polygenic score. PRSn indicates a specific subset of individual trait-specific PRS ordered chronologically based on date of publication. PRSMix_AOIn indicates polygenic scores generated using the PRSMix framework wherein each PRSn was combined with all individual PRSn published before it. CAD: coronary artery disease; T2DM: type 2 diabetes; MDD: major depressive disorder.

![Figure 2:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/07/24/2024.07.24.24310897/F2.medium.gif)

[Figure 2:](http://medrxiv.org/content/early/2024/07/24/2024.07.24.24310897/F2)

Figure 2: Jaccard index heatmaps for high genetic risk across trait-specific polygenic score (PRSn) pairs
Jaccard indices of similarity calculated for pairs of classification of top 10% risk determined by polygenic scores (PRSn) for coronary artery disease, type 2 diabetes mellitus, and major depressive disorder. Scores are ordered chronologically based on date of publication, advancing left to right and bottom to top. The median [interquartile range] Jaccard index for top 10% classification was: A) 0.17 [0.13-0.22] for coronary artery disease; B) 0.18 [0.15-0.22] for type 2 diabetes mellitus; and C) 0.11 [0.08-0.17] for major depressive disorder.

Chronologically adding score PRSn in order of publication into PRSMix_AOIn led to progressively stronger associations with the outcome. This is most evident for PRSn from the past five publications, where the C-statistics associated with the top 10% risk classification using PRSMix_AOIn vs. PRSn ranged from 0.545-0.546 vs. 0.515-0.536 for CAD, 0.543-0.547 vs. 0.520-0.546 for T2DM, and 0.521-0.522 vs. 0.500-0.521 for MDD, respectively.(Figure 3) The ORs were significantly stronger for the top 10% of individuals using PRSMix_AOIn vs. PRSn respectively, with median [IQR] ORs for CAD (1.99 [1.77-2.27] vs. 1.68 [1.48-1.87], P*heterogeneity*<0.001), T2DM (2.46 [2.45-2.50] vs. 1.86 [1.60-1.96], P*heterogeneity*<0.001), and MDD (1.47 [1.37-1.50] vs. 1.30 [1.17-1.40], P*heterogeneity*<0.001). (Figure 4A) The Nagelkerke’s pseudo- *R*2 (median [IQR]) associated with top 10% risk classification for PRSMix_AOIn vs. PRSn was higher for CAD (0.008 [0.005-0.012] vs. 0.004 [0.002-0.006], P*heterogeneity*<0.001), T2DM (0.017 [0.017-0.018] vs. 0.008 [0.004-0.009], P*heterogeneity*<0.001) and MDD (0.0033 [0.0022-0.0038] vs. 0.0014 [0.0005-0.0023], P*heterogeneity*<0.001). (Figure 4B)

![Figure 3:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/07/24/2024.07.24.24310897/F3.medium.gif)

[Figure 3:](http://medrxiv.org/content/early/2024/07/24/2024.07.24.24310897/F3)

Figure 3: Chronologic trends in C-statistic estimates for PRSn vs. PRSMix_AOIn across three traits
C-statistics are based on logistic regression models predicting disease of interest, with high genetic risk as the only variable. High genetic risk is defined as the top 10% of each respective polygenic score. PRSn indicates a chosen subset of individual trait-specific PRS ordered based on date of publication. PRSMix_AOIn indicates polygenic scores generated using the PRSMix framework, wherein PRSn is combined with all individual PRSn published before it.

![Figure 4:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/07/24/2024.07.24.24310897/F4.medium.gif)

[Figure 4:](http://medrxiv.org/content/early/2024/07/24/2024.07.24.24310897/F4)

Figure 4: Association strength and variance explained for PRSn vs. PRSMix_AOIn across three traits
A) Median and interquartile range for odds ratios across scores are based on logistic regression models predicting disease of interest, with variables of high genetic risk, age, sex, and first ten principal components of genetic ancestry. High genetic risk is defined as the top 10% of each respective polygenic score. PRSn indicates individual trait-specific PRS ordered based on date of publication. PRSMix_AOIn indicates polygenic scores generated using the PRSMix framework, PRSn was combined with all individual PRSn published before it. The proportion of phenotypic variance explained by the high polygenic risk classification by each polygenic score for each respective disease. B) Median and interquartile range for incremental Nagelkerke’s pseudo-*R2*metric across scores, as the difference of the full model inclusive of the polygenic score plus age, sex, and the first ten principal components of ancestry minus *R2* for the covariates alone. CAD: coronary artery disease. T2DM: type 2 diabetes. MDD: major depressive disorder. * designates P*heterogeneity*<0.001.

More importantly, using PRSMix_AOIn resulted in more congruent high-risk classification compared to the respective PRSn itself over time. Using PRSMix_AOIn the Jaccard index improved to median [IQR] of 0.39 [0.24-0.53] for CAD PRSMix_AOIn, 0.64 [0.45-0.85] for T2DM PRSMix_AOIn, and 0.44 [0.15-0.80] for MDD PRSMix_AOIn. (Figure 5) When restricting to PRSn from the five most recent publications, we observed significantly higher congruence in high-risk classifications between PRSMix_AOIn scores, with median Jaccard index estimates of 0.92 [0.91-0.93] for CAD PRSMix_AOIn, 0.78 [0.78-0.78] for T2DM PRSMix_AOIn, and 0.80 [0.69-0.95] for MDD PRSMix_AOIn. (Extended Figure 3) These findings of improved congruence for high genetic risk of CAD, T2DM, and MDD generalize across ancestral subgroups.(Extended Figure 4)

![Extended Figure 2:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/07/24/2024.07.24.24310897/F8.medium.gif)

[Extended Figure 2:](http://medrxiv.org/content/early/2024/07/24/2024.07.24.24310897/F8)

Extended Figure 2: Chronologic trends in association strength and variance explained estimates for PRSn vs.
PRSMix_AOIn across three traits A) Odds ratios are based on logistic regression models predicting disease of interest, with variables of high genetic risk, age, sex, and first ten principal components of genetic ancestry. High genetic risk is defined as the top 10% of each respective polygenic score. PRSn indicates individual trait-specific PRS ordered chronologically based on date of publication. PRSMix_AOIn indicates polygenic scores generated using the PRSMix framework, PRSn was combined with all individual PRSn published before it. The proportion of phenotypic variance is explained by the high polygenic risk classification by each polygenic score for each respective disease: B) Nagelkerke’s pseudo-*R2* metric, as the difference of the full model inclusive of the polygenic score plus age, sex, and the first ten principal components of ancestry minus *R2* for the covariates alone. CAD: coronary artery disease; T2DM: type 2 diabetes; MDD: major depressive disorder.

![Extended Figure 3:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/07/24/2024.07.24.24310897/F9.medium.gif)

[Extended Figure 3:](http://medrxiv.org/content/early/2024/07/24/2024.07.24.24310897/F9)

Extended Figure 3: Jaccard index distributions for high genetic risk determined by trait-specific PRSn vs.
PRSMix_AOIn from five most recent publications Jaccard indices of similarity calculated for pairs of classification of top 10% risk determined by trait-specific polygenic scores from the last five respective publications. PRSn indicates individual trait-specific PRS ordered based on date of publication. PRSMix_AOIn indicates polygenic scores generated using the PRSMix framework, wherein each PRSn was combined with all individual PRSn published before it.

![Extended Figure 4:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/07/24/2024.07.24.24310897/F10.medium.gif)

[Extended Figure 4:](http://medrxiv.org/content/early/2024/07/24/2024.07.24.24310897/F10)

Extended Figure 4: Jaccard index distributions for high genetic risk determined by trait-specific PRSn vs.
PRSMix_AOIn from five most recent publications across genetically predicted ancestry groups Jaccard indices of similarity calculated for pairs of classification of top 10% of risk determined by trait-specific polygenic scores from the last five respective publications. PRSn indicates individual trait-specific PRS ordered based on date of publication. PRSMix_AOIn indicates polygenic scores generated using the PRSMix framework wherein each PRSn was combined with all individual PRSn published before it. Genetically inferred ancestry based on k-nearest neighbor approach as European (EUR), African (AFR), admixed American (AMR), and East Asian (EAS).

![Figure 5:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/07/24/2024.07.24.24310897/F5.medium.gif)

[Figure 5:](http://medrxiv.org/content/early/2024/07/24/2024.07.24.24310897/F5)

Figure 5: Jaccard index heatmaps for high genetic risk across trait-specific PRSMix_AOIn pairs
Jaccard indices of similarity calculated for pairs of classification of top 10% risk determined by PRSMix_AOIn - polygenic scores generated by chronologically adding in PRSn using PRSMix framework for coronary artery disease, type 2 diabetes mellitus, and major depressive disorder. Scores are ordered chronologically based on date of publication, advancing left to right and bottom to top. The median [interquartile range] Jaccard index for top 10% classification was: A) 0.39 [0.24-0.53] for coronary artery disease; B) 0.64 [0.45-0.85] for type 2 diabetes mellitus; and C) 0.44 [0.15-0.80] for major depressive disorder.

As a result of the greater congruence in high-risk classification, using PRSMix_AOIn led to more consistent estimation of genetic risk percentile for individual participants. Individuals classified in top 10% risk by the PRSMixall for CAD have median percentiles across PRSn of 86 [79-91.5] vs. 93 [87-97] for PRSMix_AOIn.(Figure 6A) Similarly, individuals classified in the top 10% risk by the PRSMixall for T2DM have median percentiles across PRSn of 87 [81-93] vs. 95 [91-98] for PRSMix_AOIn.(Figure 6B) Additionally, individuals classified in the top 10% risk by the PRSMixall for MDD have median [IQR] percentiles across PRSn of 86 [79-92] vs. 95 [92-98] for PRSMix_AOIn. (Figure 6C) These findings of improved individual-level classification for high genetic risk of CAD, T2DM, and MDD generalize across ancestral subgroups.(Extended Figure 5). For any given PRSn, high-risk individuals fall to a lower, wider range of percentiles across all PRSn when compared with their paired PRSMix_AOIn scores. For the PRSn among the last five published PRSn for each trait that had the most inconsistent high-risk classification relative to PRSMixall, the median [IQR] percentiles of individuals classified in top 10% risk by PRSn vs. PRSMix_AOIn were 78 [55-92] vs. 95 [92-98] for CAD, 87 [71-95] vs. 95 [93-98] for T2DM and 59 [33-81] vs. 95 [93-98] for MDD.(Extended Figure 6).

![Extended Figure 5:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/07/24/2024.07.24.24310897/F11.medium.gif)

[Extended Figure 5:](http://medrxiv.org/content/early/2024/07/24/2024.07.24.24310897/F11)

Extended Figure 5: Median percentile of PRSn vs. PRSMix_AOIn across three traits for individuals identified as having high genetic risk using PRSMixall, across genetically predicted ancestry groups
Distributions of the median percentile per polygenic score type for individuals classified in top 10% risk by the PRSMixall for each disease of interest. PRSn indicates individual trait-specific PRS ordered based on date of publication. PRSMix_AOIn indicates polygenic scores generated using the PRSMix framework, PRSn was combined with all individual PRSn published before it. Genetically inferred ancestry based on k-nearest neighbor approach as European (EUR), African (AFR), admixed American (AMR), and East Asian (EAS).

![Extended Figure 6:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/07/24/2024.07.24.24310897/F12.medium.gif)

[Extended Figure 6:](http://medrxiv.org/content/early/2024/07/24/2024.07.24.24310897/F12)

Extended Figure 6: Score-specific percentile distributions of PRSn vs. PRSMix_AOIn across three traits for individuals identified as having high genetic risk using PRSMixall
Median percentile and interquartile range per polygenic score type for individuals classified in top 10% risk by the PRSMixall for each disease of interest. PRSn indicates individual trait-specific PRS ordered based on date of publication. PRSMix_AOIn indicates polygenic scores generated using the PRSMix framework, PRSn was combined with all individual PRSn published before it.

![Figure 6:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/07/24/2024.07.24.24310897/F6.medium.gif)

[Figure 6:](http://medrxiv.org/content/early/2024/07/24/2024.07.24.24310897/F6)

Figure 6: Median percentile distributions of PRSn vs. PRSMix_AOIn across three traits for individuals identified as having high genetic risk using PRSMixall
Distributions of the median percentile per polygenic score type for individuals classified in top 10% risk by the PRSMixall for each disease of interest. PRSn indicates individual trait-specific PRS ordered based on date of publication. PRSMix_AOIn indicates polygenic scores generated using the PRSMix framework, PRSn was combined with all individual PRS published before it. A) Individuals classified in top 10% risk by the PRSMixall for CAD have median percentiles across PRSn of 86 [79-91.5] vs. 93 [87-97] for PRSMix_AOIn. B) Similarly, individuals classified in the top 10% risk by the PRSMixall for T2DM have median percentiles across PRSn of 87 [81- 93] vs. 95 [91-98] for PRSMix_AOIn. C) Additionally, individuals classified in the top 10% risk by the PRSMixall for MDD have median percentiles across PRSn of 86 [79-92] vs. 95 [92-98] for PRSMix_AOIn.

In a comprehensive assessment of published PRS for three complex diseases, high PRS classification of individual participants was highly inconsistent across individual PRS. Because the strengths of association of individual PRS were significant at the population level, each score captures complementary predictive information as a by-product of its training process. This can be a function of the size of its GWAS, the composition of the training dataset, the exact phenotype definition, and assumptions of the polygenic score method, among other factors. We observed instability in risk estimates even between scores generated from the same GWAS or trained in the same cohort study. This instability builds on the inherent large variances in individual PRS estimates due to propagation of error as a function of GWAS sample size, number of causal SNPs and SNP-heritability – collectively resulting in instability of prediction across scores.22

Discrepancy in classification of high genetic risk is problematic in current clinical implementation efforts based on single PRS. As GWAS continue to grow and PRS methods continue to improve, newer scores will be published. While we observe a high degrees of correlation of scores across population-level metrics, we show that each subsequent score has poor agreement of high-risk (as defined as being in the top 10%) status. As single PRS continue to demonstrate iterative improvement, adopting new single PRS empirically will lead to continual reassignment of high-risk, which will yield confusion and lack of confidence. In addition to reporting population-based association, prediction, and calibration metrics, benchmarking individual scores in hold-out datasets to understand these implications is critical prior to implementation.

Use of PRSMix\_AOI to sequentially incorporate new PRS is a framework that can be deployed by biobanks to provide a more robust and stable classification of high-risk as well as improved overall PRS population-based metrics. This method factors out highly correlated scores and utilizes complementary information of available PRS to help predict risk more accurately in the target population. When a new score is published, it can be incorporated into the PRSMix\_AOI model yielding an improved PRS with less marked variability in updating the classification of high-risk for individuals. Importantly, stability is accompanied by more accurate high-risk classification. Very significant gains in predictive performance for an additional PRS would cause more substantial and likely appropriate classification changes with this approach. There is nominal additional computational effort needed in calculating the input scores needed to calculate PRSMix_AOI. Moreover, this method’s stable and improved prediction generalizes across ancestral groups, helping mitigate previously characterized disparities in PRS performance.23 The better accuracy and inter-PRSMix_AOIn consistency make it an ideal framework for incorporation into clinical workflows for PRS reporting and updating.

This study has several limitations. To mirror the development of the majority of published PRS, and given the recency of and middle-aged baseline of AOU, we performed association analyses with prevalent disease and still found significant instability in prediction. Future efforts on instability of incident disease prediction with follow-up beginning early in life will be additionally informative. We also used hospitalization and procedural codes to classify disease instead of clinically adjudicated outcomes. In this manuscript, we focus on common complex diseases for well-powered analyses. While some reports use OR for high risk classification, we show wide variability in population-level metrics and percentile classification leading to wide variability in individual-level OR estimation by PRSn. Thus the proposed PRSMix_AOI approach provides stability for both percentile and OR-based reporting.

In conclusion, in a comprehensive assessment of published PRS for three complex diseases, high PRS classification was highly inconsistent. Using PRSMix_AOI to sequentially incorporate new PRS enables standardization toward more stable and reliable classification of high risk.

## Online Methods

### Study population

The *All of Us* (AOU) Research Program is a cohort study focused on recruiting individuals traditionally underrepresented in biomedical research. Since 2018, AOU has enrolled people aged 18 and older from over 730 sites across the United States. The program has consented more than 800,000 participants, of whom over 560,000 have completed the basics of enrollment including collection of health questionnaires and biospecimens. For these participants, there is ongoing linkage to electronic health record (EHR) data, including ICD-9/ICD-10, SNOMED, and CPT codes. Genetic data comprises array samples from 315,000 participants and whole- genome sequencing (WGS) from 245,394 participants. This study used WGS data from the Controlled Tier Dataset version 7 release.

### Outcome ascertainment

CAD was defined based on self-report, occurrences of at least 2 diagnosis codes for myocardial infarction or a single procedure code for coronary revascularization. T2DM was defined based on diagnosis codes, laboratory results and medication prescriptions, as previously described by the eMERGE consortium.24 MDD was defined based on the presence of at least two diagnosis codes for major depressive disorder. All phenotype definitions are detailed in supplementary tables 2-4.

### Genotyping and quality control

Participants in AOU were genotyped using the Illumina Global Diversity Array at AOU genome centers. Central quality control measures included filtering for sex concordance, a cross- individual contamination rate below 3%, and a call rate above 98%. Further quality control performed by AOU included filtering for variants with population-specific allele frequency greater than 1% or a population-specific allele count greater than 100 in any AOU-computed ancestry subpopulations. Ancestry was inferred based on genetic similarity with projections of 20 principal components of genetic ancestry using 1000 Genomes as a reference panel. Inferred genetic ancestry in AOU was estimated in high-quality WGS samples that were restricted to bi- allelic sites, a minor allele frequency above 0.1%, a call rate above 99%, and a linkage disequilibrium-pruned threshold r2 = 0.1.

### Polygenic score calculation

The variant effect sizes for 61 scores for CAD, 133 scores for T2DM and 18 scores for MDD were downloaded from Polygenic Score Catalog on May 28, 2024. Of these, we excluded 4 CAD scores and 4 T2DM scores that were already developed using PRSMix to prevent potential overfitting. This left 57 CAD scores, 129 T2DM scores, and 18 MDD scores for use in further analysis, none of which were previously trained in AOU or developed using AOU GWAS data.

For score accession numbers see supplementary tables 5-7. All scores were harmonized to the AOU reference dataset using PRSMix. Scores with an OR/SD < 1 in the entire AOU cohort were considered anti-correlated and were thus inverted in all further analyses. All scoring in AOU was done using PLINK2 software. All polygenic scores were adjusted for enrollment age, sex, and the first ten principal components of genetic ancestry and then standardized to have a mean of 0 and a standard deviation of 1. Individuals were next binned into 100 groupings according to percentile of score, and individuals in the top 10% of the score distribution were deemed to have high genetic risk.

### PRSMix and PRSMix_AOI

PRSMix is a framework that evaluates and leverages the data from a group of PRS for a target trait to generate a new score with improved prediction accuracy. PRSMix uses an elastic net model to produce a weighted linear combination of all the input PRS. The PRSMix framework was used first to harmonize all scores from the PGS Catalog to the AOU reference dataset via the *harmonize\_snpeffect\_toALT* function. In addition, PRSMix was used to combine PRS via the *combine_PRS* function. PRSMix was run using all default parameters. This includes the use of age, sex, and the first 10 principal components as covariates, as well as an 80% vs. 20% split of the cohort into the training and testing cohorts, respectively. For each trait, we generated a PRSMixall score defined as the PRSMix of all published PRS for a given trait.

Within each trait of interest, only a subset of all the trait-specific scores were selected to be used as a PRSn for subsequent paired PRSMix_AOIn. For each unique PGS Catalog Publication (PGP) ID found within the group of all trait-specific scores, the score with the highest OR/SD was selected as the representative PRSn for that publication. This selection mechanism resulted in 40 PRSn out of 57 total CAD scores, 39 PRSn out of 129 total T2DM scores, and 7 PRSn out of 18 total MDD scores. Each group of trait-specific PRSn were then ordered by publication date, using the number of variants included in the score as the tiebreaker with smaller variant sizes coming first. For each of the three trait-specific groups of ordered PRSn, we generated a parallel group of sequential add-one-in (AOI) PRSMix_AOIn scores. For each PRSn, we used PRSMix to combine the PRSn with all other PRSn published prior in its group to produce its corresponding PRSMix_AOIn score.

### Statistical analysis

The association of polygenic scores with outcome of interest were assessed using logistic regression with covariates of enrollment age, sex, and the first 10 principal components of genetic ancestry. The discrimination of each of these polygenic scores was assessed using Harrell’s C-statistic. The proportion of phenotypic variance explained by the polygenic score on the observed scale was determined using Nagelkerke’s pseudo-*R2* metric via the rcompanion R package. This calculation involved finding the *R2*for the complete model, which included the variable of interest and baseline model covariates, and then subtracting the *R2* for the baseline covariates alone. Significance in differences between distributions of PRSn and PRSMix_AOIn were determined through subgroup heterogeneity meta-analysis. Standard errors for Nagelkerke’s pseudo-*R2* were determined via standard bootstrapping procedure. Congruence of risk estimates was assessed with the Jaccard similarity index, which measures the amount of overlap or similarity between two sample sets.25 All analyses were two-sided. Statistical analyses were performed using R version 4.3.1.

## Supporting information

Supplemental Tables 1-7 [[supplements/310897_file14.xlsx]](pending:yes)

## Data availability

All data are made available from the *All of Us* Research Study to researchers from universities and other institutions with genuine research inquiries following institutional review board and *All of Us* approval. This research was approved by the Mass General Brigham institutional review board. The weights of the polygenic scores analyzed in this study are publicly available for download from the Polygenic Score Catalog.20 The mixing weights of the PRSMix_AOIn are available in the supplementary tables 5-7.

## Software and Code Availability

No custom software or algorithms were developed for data collection. We used the following softwares for handling and scoring the genetic data: Plink 2.0, R 4.3.1 (including package PRSmix), and Python 3.11.5 (including packages pandas 2.1.4 and NumPy 1.6.2). For data analysis and visualization, we used R 4.3.1 (including packages PRSmix, ggplot2, meta, class, rcompanion, survminer) and Python 3.11.5 (including packages pandas 2.1.4, NumPy 1.6.2, and Matplotlib 3.7.2). Analyses were performed on the AOU Researcher Workbench. Results are reported in compliance with the AoU Data and Statistics Dissemination Policy.

## Disclosures

A.C.F. reports being co-founder of Goodpath, serving as scientific advisor to MyOme and HeartFlow, and receiving a research grant from Foresite Labs. J.W.S. is a member of the Scientific Advisory Board of Sensorium Therapeutics (with options), has received grant support from Biogen, Inc. and is PI of a collaborative study of the genetics of depression and bipolar disorder sponsored by 23andMe for which 23andMe provides analysis time as in-kind support but no payments. P.N. reports research grants from Allelica, Amgen, Apple, Boston Scientific, Genentech / Roche, and Novartis, personal fees from Allelica, Apple, AstraZeneca, Blackstone Life Sciences, Creative Education Concepts, CRISPR Therapeutics, Eli Lilly & Co, Esperion Therapeutics, Foresite Labs, Genentech / Roche, GV, HeartFlow, Magnet Biomedicine, Merck, Novartis, TenSixteen Bio, and Tourmaline Bio, equity in Bolt, Candela, Mercury, MyOme, Parameter Health, Preciseli, and TenSixteen Bio, and spousal employment at Vertex Pharmaceuticals, all unrelated to the present work. The remaining authors declare no competing interests.

## Acknowledgements

This work was supported by grants K08HL161448 (to A.C.F.), R01HL164629 (to A.C.F.), K08HL168238 (to A.P.P.), R01HL169015 (to A.P.P.), R01HL1427 (to P.N.), R01HL148565 (to P.N.), R01HL148050 (to P.N.) from the National Heart, Lung, and Blood Institute and grants T32HG010464 (to S.M.U.), R01HG012354 (to A.P.P.) and U01HG011719 (to A.P.P. and P.N.) from the National Human Genome Research Institute. This research has been conducted using the *All of Us* cohort study. We gratefully acknowledge *All of Us* participants for their contributions, without whom this research would not have been possible. We also thank the National Institutes of Health’s *All of Us* Research Program for making available the participant data examined in this study.

## Footnotes

*   * These authors jointly supervised this work

*   Received July 24, 2024.
*   Revision received July 24, 2024.
*   Accepted July 24, 2024.


*   © 2024, Posted by Cold Spring Harbor Laboratory

The copyright holder for this pre-print is the author. All rights reserved. The material may not be redistributed, re-used or adapted without the author's permission.

## References

1.  1.Patel, A. P. & Khera, A. V. Advances and Applications of Polygenic Scores for Coronary Artery Disease. Annu. Rev. Med. 74, 141–154 (2023).
    
    
2.  2.Privé, F., Arbel, J. & Vilhjálmsson, B. J. LDpred2: better, faster, stronger. Bioinforma. Oxf. Engl. btaa1029 (2020) doi:10.1093/bioinformatics/btaa1029.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btaa1029&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33326037&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F24%2F2024.07.24.24310897.atom) 

3.  3.Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3190&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25642633&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F24%2F2024.07.24.24310897.atom) 

4.  4.Ruan, Y. et al. Improving polygenic prediction in ancestrally diverse populations. Nat. Genet. 54, 573–580 (2022).
    
    
5.  5.Patel, A. P. & Fahed, A. C. Pragmatic Approach to Applying Polygenic Risk Scores to Diverse Populations. Curr. Protoc. 3, e911 (2023).
    
    
6.  6.Yengo, L. et al. A saturated map of common genetic variants associated with human height. Nature 610, 704–712 (2022).
    
    
7.  7.Aragam, K. G. et al. Discovery and systematic characterization of risk variants and genes for coronary artery disease in over a million participants. Nat. Genet. 54, 1803–1815 (2022).
    
    
8.  8.Patel, A. P. et al. A multi-ancestry polygenic risk score improves risk prediction for coronary artery disease. Nat. Med. (2023) doi:10.1038/s41591-023-02429-x.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41591-023-02429-x&link_type=DOI) 

9.  9.Multhaup, M. L., et al. The science behind 23andMe’s Type 2 Diabetes report: Estimating the likelihood of developing type 2 diabetes with polygenic models. (2019).
    
    
10. 10.Busby, G., Bolli, A., Di Domenico, P. & Botta, G. Development and Validation of a Polygenic Risk Score for Coronary Artery Disease.
    
    
11. 11.Lennon, N. J. et al. Selection, optimization and validation of ten chronic disease polygenic risk scores for clinical implementation in diverse US populations. Nat. Med. 30, 480–487 (2024).
    
    
12. 12.Linder, J. E. et al. Returning integrated genomic risk and clinical recommendations: The eMERGE study. Genet. Med. Off. J. Am. Coll. Med. Genet. 25, 100006 (2023).
    
    
13. 13.McCarty, C. A. et al. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med. Genomics 4, 13 (2011).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1755-8794-4-13&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21269473&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F24%2F2024.07.24.24310897.atom) 

14. 14.Hao, L. et al. Development of a clinical polygenic risk score assay and reporting workflow. Nat. Med. 28, 1006–1013 (2022).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41591-022-01767-6&link_type=DOI) 

15. 15.Karlson, E. W., Boutin, N. T., Hoffnagle, A. G. & Allen, N. L. Building the Partners HealthCare Biobank at Partners Personalized Medicine: Informed Consent, Return of Research Results, Recruitment Lessons and Operational Considerations. J. Pers. Med. 6, (2016).
    
    
16. 16.O’Sullivan, J. W. et al. Polygenic Risk Scores for Cardiovascular Disease: A Scientific Statement From the American Heart Association. Circulation 146, e93–e118 (2022).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1161/CIR.0000000000001077&link_type=DOI) 

17. 17.Kullo, I. J. et al. Incorporating a Genetic Risk Score Into Coronary Heart Disease Risk Estimates: Effect on Low-Density Lipoprotein Cholesterol Levels (the MI-GENES Clinical Trial). Circulation 133, 1181–1188 (2016).
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTQ6ImNpcmN1bGF0aW9uYWhhIjtzOjU6InJlc2lkIjtzOjExOiIxMzMvMTIvMTE4MSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDI0LzA3LzI0LzIwMjQuMDcuMjQuMjQzMTA4OTcuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

18. 18.All of Us Research Program Investigators et al. The ‘All of Us’ Research Program. N. Engl. J. Med. 381, 668–676 (2019).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMsr1809937&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31412182&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F24%2F2024.07.24.24310897.atom) 

19. 19.Truong, B. et al. Integrative polygenic risk score improves the prediction accuracy of complex traits and diseases. Cell Genomics 4, 100523 (2024).
    
    
20. 20.Lambert, S. A. et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat. Genet. 53, 420–425 (2021).
    
    
21. 21.All of Us Research Program Genomics Investigators. Genomic data in the All of Us Research Program. Nature 627, 340–346 (2024).
    
    
22. 22.Ding, Y. et al. Large uncertainty in individual polygenic risk score estimation impacts PRS- based risk stratification. Nat. Genet. 54, 30–39 (2022).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-021-00961-5&link_type=DOI) 

23. 23.Martin, A. R. et al. Current clinical use of polygenic scores will risk exacerbating health disparities. Nat. Genet. 51, 584–591 (2019).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-019-0379-x&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30926966&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F07%2F24%2F2024.07.24.24310897.atom) 

24. 24.Kho, A. N. et al. Electronic medical records for genetic research: results of the eMERGE consortium. Sci. Transl. Med. 3, 79re1 (2011).
    
    
25. 25.Tan, P.-N., Steinbach, M. & Kumar, V. Introduction to Data Mining, (First Edition). (Addison- Wesley Longman Publishing Co., Inc., USA, 2005).