ABSTRACT
Background Polygenic risk scores (PRS) are tools used to evaluate an individual’s susceptibility to polygenic diseases based on their genetic profile. A considerable proportion of people carry a high genetic risk but evade the disease. On the other hand, some individuals with a low risk of eventually developing the disease. We hypothesized that unknown counterfactors might be involved in reversing the PRS prediction, which might provide new insights into the pathogenesis, prevention, and early intervention of diseases.
Methods We built a novel computational framework to identify genetically-regulated pathways (GRPas) using PRS-based stratification for each cohort. We curated two AD cohorts with genotyping data; the discovery (disc) and the replication (rep) datasets include 2722 and 2854 individuals, respectively. First, we calculated the optimized PRS model based on the three recent AD GWAS summary statistics for each cohort. Then, we stratified the individuals by their PRS and clinical diagnosis into six biologically meaningful PRS strata, such as AD cases with low/high risk and cognitively normal (CN) with low/high risk. Lastly, we imputed individual genetically-regulated expression (GReX) and identified differential GReX and GRPas between risk strata using gene-set enrichment and variational analyses in two models, with and without APOE effects. An orthogonality test was further conducted to verify those GRPas are independent of PRS risk. To verify the generalizability of other polygenic diseases, we further applied a default model of GRPa-PRS for schizophrenia (SCZ).
Results For each stratum, we conducted the same procedures in both the disc and rep datasets for comparison. In AD, we identified several well-known AD-related pathways, including amyloid-beta clearance, tau protein binding, and astrocyte response to oxidative stress. Additionally, we discovered resilience-related GRPs that are orthogonal to AD PRS, such as the calcium signaling pathway and divalent inorganic cation homeostasis. In SCZ, pathways related to mitochondrial function and muscle development were highlighted. Finally, our GRPa-PRS method identified more consistent differential pathways compared to another variant-based pathway PRS method.
Conclusions We developed a framework, GRPa-PRS, to systematically explore the differential GReX and GRPas among individuals stratified by their estimated PRS. The GReX-level comparison among those strata unveiled new insights into the pathways associated with disease risk and resilience. Our framework is extendable to other polygenic complex diseases.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This research was partially supported by National Institutes of Health grants awarded to Y.D. and Z.Z. (R21AG087299), and to Z.Z (U01AG079847, R03AG077191, R01LM012806, R01DE030122, and R01DE029818). We thanked the resource support from the Cancer Prevention and Research Institute of Texas (CPRIT RP180734). A.L. is supported by a training fellowship from the Gulf Coast Consortia on Training in Precision Environmental Health Sciences (TPEHS) Training Grant (T32ES027801).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/dataset.cgi?study_id=phs000168.v2.p2&pht=710 https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/analysis.cgi?study_id=phs000219.v1.p1&pha=2879#:∼:text=GenADA%20is%20a%20multi-site%20collaborative%20study%2C%20involving%20GlaxoSmithKline,variations%20in%20candidate%20genes%20with%20Alzheimer%E2%80%99s%20disease%20phenotypes https://www.synapse.org/#!Synapse:syn5550382 https://www.synapse.org/#!Synapse:syn3157325 https://adni.loni.usc.edu https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000167.v1.p1
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Based on the reviewer comments, we made the following major revise: 1. to validate the generalizability of our GRPa-PRS framework, we adapted two independent cohorts from Schizophrenia as the second trait in this revision; 2. We adopted the latest EraSOR1 method to eliminate inflation caused by sample overlap in polygenic score analyses for both AD and SCZ 3. We use MAGMA, which conducts GSEA considering the correlation between genes, to replace the GSEA in the previous version; 4. We conducted a power analysis and visualized it in supplementary figures.
DATA AVAILABILITY
All the data generated or analyzed in this study is available from the authors upon reasonable request. The overall framework can be downloaded from https://github.com/davidroad/GRPa-PRS.
https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/dataset.cgi?study_id=phs000168.v2.p2&pht=710 https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/analysis.cgi?study_id=phs000219.v1.p1&pha=2879#:~:text=GenADA%20is%20a%20multi-site%20collaborative%20study%2C%20involving%20GlaxoSmithKline,variations%20in%20candidate%20genes%20with%20Alzheimer%E2%80%99s%20disease%20phenotypes https://www.synapse.org/#!Synapse:syn5550382 https://www.synapse.org/#!Synapse:syn3157325 https://adni.loni.usc.edu
Abbreviations
- AD
- Alzheimer’s disease;
- AUC
- area under the receiver operating curve;
- BP
- Biological Process;
- CC
- Cellular Component;
- CI
- confidence interval;
- CN
- cognitively normal;
- CDR
- clinical dementia rating;
- disc
- discovery;
- FDR
- false discovery rate;
- eQTL
- expression quantitative trait loci;
- GO
- gene ontology;
- GRPa
- genetically-regulated pathway;
- GRPa-MAGMA
- genetically-regulated pathway-gene-set enrichment analysis;
- GRPa-GSVA
- genetically-regulated pathway-gene-set variational analysis;
- GReX
- genetically-regulated expression;
- GSEA
- gene-set enrichment analysis;
- GSVA
- gene-set variational analysis;
- GWAS
- genome-wide association study;
- GWAX
- genome-wide association study based on proxy phenotype;
- h2
- heritability;
- KEGG
- Kyoto Encyclopedia of Genes and Genomes;
- LOAD
- late-onset Alzheimer’s Disease;
- LogisticLRT
- logistic likelihood ratio test;
- MF
- Molecular Function;
- OR
- odds ratio;
- PRS
- polygenic risk scores;
- rep
- replication;
- TWAS
- transcriptome-wide association study;
- WGS
- Whole Genome Sequencing;
- Risk strata comparison: T
- top;
- B
- bottom;
- TB
- top vs bottom;
- Ctr
- control;
- TBall
- top vs bottom for all individuals;
- TBAD
- top vs bottom for only AD patients;
- TBCtr
- top vs bottom for only controls;
- Parallel subgroups: K
- stratify by PRS based on Kunkle et al.’s AD
- GWAS; S
- stratify by PRS based on Schwartzentruber et al.’s AD GWAS;
- W
- stratify by PRS based on Wightman et al.’s AD GWAS;
- T
- stratify by PRS based on Trubetskoy et al.’s SCZ GWAS;