Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Scalable Bayesian functional GWAS method accounting for multivariate quantitative functional annotations with applications to studying Alzheimer’s disease

Junyu Chen, Lei Wang, View ORCID ProfilePhilip L. De Jager, David A. Bennett, Aron S. Buchman, View ORCID ProfileJingjing Yang
doi: https://doi.org/10.1101/2022.08.12.22278704
Junyu Chen
1Department of Epidemiology, Emory University School of Public Health, Atlanta, GA, 30322, USA
2Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lei Wang
2Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
3Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Philip L. De Jager
4Center for Translational and Computational Neuroimmunology, Department of Neurology and Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY 10032, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Philip L. De Jager
David A. Bennett
5Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, IL, 60612, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Aron S. Buchman
5Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, IL, 60612, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jingjing Yang
2Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jingjing Yang
  • For correspondence: jingjing.yang{at}emory.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Existing methods for integrating functional annotations in GWAS to fine-map and prioritize potential causal variants are either limited to using non-overlapped categorical annotations, or limited by the computation burden of modeling genome-wide variants. To overcome these limitations, we propose a scalable Bayesian functional GWAS method to account for multivariate quantitative functional annotations (BFGWAS_QUANT), accompanied by a scalable computation algorithm enabling joint modeling of genome-wide variants. Simulation studies validated the performance of BFGWAS_QUANT for accurately quantifying annotation enrichment and improving GWAS power. Applying BFGWAS_QUANT to study five Alzheimer’s disease (AD) related phenotypes using individual-level GWAS data (n=∼1K), we found that histone modification annotations have higher enrichment than eQTL annotations for all considered phenotypes, with the highest enrichment in H3K27me3 (polycomb regression). We also found that cis-eQTL in microglia had higher enrichment than eQTL of bulk brain frontal cortex tissue for all considered phenotypes. A similar enrichment pattern was also identified using the IGAP summary-level GWAS data of AD (n=∼54K). The strongest known APOE E4 risk allele was identified for all five phenotypes and the APOE locus was validated using the IGAP data. BFGWAS_QUANT fine-mapped 32 significant variants from 1073 genome-wide significant variants in the IGAP data. We further demonstrated that the polygenic risk scores (PRS) using effect size estimates by BFGWAS_QUANT had similar prediction accuracy as other methods assuming a sparse causal model. Overall, BFGWAS_QUANT provides a useful GWAS tool for quantifying annotation enrichment and prioritizing potential causal variants.

Introduction

Although thousands of significant associations have been identified by single variant genome-wide association studies (GWAS) for complex traits and diseases, majority GWAS signals reside in the noncoding genome regions and have unknown biological meaning1-3. Existing GWAS results based on single variant tests are still difficult to interpret with respect to the underlying biological mechanisms4; 5. Promisingly, advancements in sequencing technology have made plenteous multi-omics data available which provide functional annotations of genetic variants available to the scientific community –– Combined Annotation-Dependent Depletion (CADD) score6, Roadmap Epigenomics Mapping Consortium7 providing DNA methylation, histone modification, and chromatin accessibility information of various human tissues, the Encyclopedia of DNA Elements (ENCODE)8 providing functional information of human and mouse genomes, and the Genotype-Tissue Expression (GTEx) project9 providing expression quantitative trait loci (eQTL) information of 54 human tissues. In particular, molecular quantitative trait loci (QTL) mapped from profiles of molecular phenotypes (e.g., gene expression from GTEx9, chromatin marks from Roadmap7; 10; 11, and protein abundances12) and corresponding genomes (genotype data) have been shown to be enriched with GWAS signals and help interpretate the underlying biology for studying complex traits and diseases13-15. Especially, these molecular QTL have been leveraged to prioritize GWAS associations of complex traits and diseases16-20.

An intuitive but widely used ad-hoc approach is to fine-map and prioritize potential causal GWAS signals that are also molecular QTL21, or located in a region with histone modifications. Recently, advanced statistical methods have been proposed to integrate non-overlapped categorical functional annotation (assigning one function label per variant) with GWAS data to fine-map GWAS results22-26. Additionally, PAINTOR was proposed to integrate multivariate quantitative functional annotations with GWAS summary statistics to fine-map GWAS loci with thousands of variants27; 28; FunSPU29 and STAAR30 were proposed to incorporate multiple biological annotations for rare variant association tests. These existing methods have shown the feasibility and promising results of integrating multivariate quantitative functional annotations with GWAS data to fine-map GWAS results and prioritize potential causal variants. However, these methods were not developed for joint modeling millions of genome-wide variants with multivariate quantitative annotations, which would lead to less accurate quantification of annotation enrichment and reduced power of fine-mapping.

Here, we propose a scalable Bayesian Functional GWAS method for integrating multivariate QUANTitative functional annotations with GWAS data by a Bayesian hierarchical variable selection regression model, referred to as BFGWAS_QUANT. BFGWAS_QUANT assumes a hierarchical logistic prior for the causal probabilities of genetic variants in the standard Bayesian Variable Selection Regression (BVSR) based GWAS method31 to jointly model millions of genome-wide genetic variants. BFGWAS_QUANT adapts the scalable Expectation Maximization Markov Chain Monte Carlo (EM-MCMC) algorithm developed by the previous Bayesian functional GWAS (BFGWAS) method that only models non-overlapped categorical annotations (referred to as BFGWAS_CAT in this paper)22. BFGWAS_QUANT further improves the computation efficiency by pre-calculating linkage disequilibrium (LD) correlation matrix and single variant test Zscore statistics that are used in the MCMC algorithm, or using reference LD correlation matrix and summary-level GWAS data. Bayesian Causal Posterior Probability (CPP) and genetic effect size estimates will be generated by BFGWAS_QUANT, along with enrichment quantification of considered multivariate quantitative functional annotations. Furthermore, Bayesian estimates of genetic effect sizes can be used to derive polygenic risk scores (PRS) that account for functional annotations.

By simulation studies, we showed that our Bayesian estimates of functional enrichment converged and GWAS power was improved over the standard BVSR method without accounting for functional annotations31. We then applied BFGWAS_QUANT to real GWAS data for studying Alzheimer’s disease (AD) related phenotypes32; 33, accounting for multivariate quantitative annotations with respect to Roadmap histone modifications7 of brain mid-frontal gyrus and eQTL of brain frontal cortex tissue32; 34 and microglia cell type35; 36. We showed that BFGWAS_QUANT identified interesting enrichment pattern and generated fine-mapped GWAS results using both individual-level and summary-level GWAS data. We also showed that, PRS derived from BFGWAS_QUANT effect size estimates led to similar accurate AD risk prediction as other PRS methods assuming a sparse causal model.

In the following Material and Methods section, we describe an overview of the BFGWAS_QUANT method, Roadmap histone modification and eQTL based quantitative functional annotations, simulation study design, and application studies of AD. In the Results section, we describe the results of simulation and application studies. We then end with a Discussion about the advantages and limitations of BFGWAS_QUANT.

Material and Methods

Bayesian hierarchical variable selection regression model

BFGWAS_QUANT assumes a Bayesian hierarchical variable selection regression (BVSR)31 model for genome-wide variants, Embedded Image where yn×1 is a vector of standardized phenotype with n subjects; Xn×p is the standardized genotype matrix with p genome-wide genetic variants and βp×1 is the vector of the genetic effect sizes. Spike-and-slab variable selection prior is assumed per effect size βi. That is, βi has probability πi to be non-zero and follows a normal distribution centered at zero and probability (1 − πi) to be zero with a point-mass density function at 0, where πi denotes the “casual” probability of the ith variant.

We assume a hierarchical logistic model for the causal probabilities of genetic variants to account for multivariate quantitative functional annotations, Embedded Image where πi denotes the casual probability of the ith variant as in Equation 1, Ai = (1, Ai1, …, AiJ)′ denotes the augmented annotation vector for the ith variant with an intercept term as the first element, and coefficient vector α = (α0, α1, …, αJ)′ denotes the intercept term α0 and enrichment quantification with respect to functional annotation j = 1, …, J.

Further, we assume a fixed value in the domain of (0, 1] for τβ, a fixed value for α0 in the domain of (−13.8, -9), and a standard normal prior for enrichment parameters (αj∼N(0, 1); j = 0, 1, … J). In particular, τβ = 1 would assume the prior variance of effect sizes is the same as the marginal effect size estimates in a single variant regression model, and smaller τβ values would inflate the magnitude of Bayesian effect size estimates. The lower bound value of α0 = −13.8 would assume the prior causal probability is 10−6 when αj = 0, j = 1, …, J (see Supplementary Text for model details).

Adapted EM-MCMC algorithm

To overcome heavy computational burden and poor mixing rate of posterior samplings by the standard MCMC algorithm31, we adapt the scalable EM-MCMC algorithm developed for the original BFGWAS method22. Specifically, we first segment genome-wide variants into approximately independent genome blocks with about 5,000∼10,000 variants based on LD structure22; 37. Second, conditioning on given hyper parameters (αj, j = 1, …, J), we conduct standard MCMC algorithm within each genome block to obtain Bayesian posterior estimates for genetic effect size (βi) and causal probability (πi) per variant (Expectation (E) step). Third, conditioning on the Bayesian estimates of (βi, πi, i = 1, …, p), we will update the values of hyper parameters (αj, i = 1, …, J) by maximizing their conditional posterior likelihood (Maximization (M) step). Computational optimization algorithm Broyden-Fletcher-Goldfarb-Shanno (BFGS)38 is used to obtain the maximum a posteriori (MAP) estimates for αj, j = 1, …, J. The EM steps will be iterated (∼5 iterations) until the estimates of hyper parameters converge.

The Bayesian estimate for the causal probability (πi) per variant is referred as CPP, and the Bayesian estimates for annotation coefficients (αj, j = 1, …, J) are referred as quantified enrichment of multivariate functional annotations. SNPs with CPP > 0.1068 will be considered as significantly associated with the phenotype of interest, where the significance threshold was shown equivalent to p-value < 5 × 10−8 by previously BFGWAS_CAT paper22. Because a multivariate regression model is fitted per genome-block, the LD among all variants per genome-block is accounted for during the Bayesian inference of CPP and effect size β. The GWAS results obtained by BFGWAS_QUANT will be fine-mapped and variants with enriched annotations will be prioritized.

In particular, implementing the MCMC algorithm per genome-block can greatly reduce the search space (from genome-wide to a genome-block) and facilitate parallel computing (one core per genome-block), thus leading to efficient convergence rate and improved mixing rate. Computation efficiency is further improved by implementing the MCMC algorithm using pre-calculated LD correlation matrix per genome-block and single variant test Zscore statistics, or reference LD correlation matrix and GWAS Z-score statistics, which will save up to 90% computation time compared to using individual-level GWAS data37. With 32 computation cores in one node, BFGWAS_QUANT can complete analyzing ∼10M SNPs in ∼4hours for 5 EM iterations.

eQTL based functional annotations

In this paper, we considered 5 real eQTL based quantitative functional annotations. Three of these annotations (Allcis-eQTL, 95%CredibleSet, MaxCPP) were constructed based on standard cis-eQTL data of brain frontal cortex tissue from GTEx data9; 34: i) Binary annotation Allcis-eQTL was constructed by taking all SNPs that were identified as a significant cis-eQTL (false discovery rate (FDR) <5%; 1MB from transcription start site (TSS)) for at least one expression quantitative trait (one gene) as 1 and otherwise as 0; ii) For each gene expression trait that has at least one significant (FDR<5%) cis-eQTL, CAVIAR16 was used to calculate the causal posterior probability (cis-CPP) of each cis-SNP and identify a 95% credible set. SNPs that do not belong to any 95% Credible Set were taken as 0 or otherwise 1 for the annotation of 95%CredibleSet. iii) Maximum cis-CPP per SNP across all genes were taken as quantitative values of MaxCPP. In addition, we took the maximum Bayesian Genome-Wide CPP of being either cis- or trans-eQTL across all genes in brain frontal cortex tissue from the Religious Orders Study and Rush Memory and Aging Project (ROS/MAP)21; 33 as the fourth annotation BGW_MaxCPP, where cis- and trans-CPPs were estimated by Bayesian genome-wide TWAS (BGW-TWAS) method32; 37. Lastly, we derived a fifth Microglia-eQTL annotation from two datasets of recent Microglia cell-type specific eQTL summary statistics35; 39, where 1 indicates being identified as a cis-eQTL to any gene in either Microglia dataset or otherwise 0.

Histone modifications based functional annotation

We constructed 5 histone modification based functional annotations using the epigenomics data of core histone modifications in the brain mid frontal gyrus region from the ROADMAP Epigenomics database7 –– H3K4me1 (primed enhancers), H3K4me3 (promoters), H3K36me3 (gene bodies), H3K27me3 (polycomb regression) and H3K9me3 (heterochromatin). For each histone modification, peak regions from replicates of the same sample were first merged together and then overlapped with peak regions of other samples by bedtools (v 2.27.0)40. If a genetic variant resides in the overlapped peak regions of a histone modification, 1 would be assigned to the function annotation of such modification for this variant or 0 would be assigned otherwise.

Simulation study design

We conducted simulation studies to validate the performance of BFGWAS_QUANT. Continuous phenotypes were simulated using the real whole genome sequence (WGS) data of Chromosomes 19 (122,745 SNPs with minor allele frequency (MAF) >0.01) for n=1,893 samples from the ROS/MAP cohort33; 41 and Mount Sinai Brain Bank (MSBB) study42. Phenotypes were simulated based on the multivariate linear additive model (Equation 1) with true genetic effect sizes βp×1 generated based on the hierarchical logistic model with multivariate quantitative functional annotations (Equation 2). Scenarios with various numbers of true causals and heritability were considered.

Besides real cis-eQTL based functional annotations of Allcis-QTL, 95%CredibleSet, MaxCPP, we also considered a fourth artificial annotation randomly generated from N(0, 1) as a negative control. With chosen annotation enrichment parameters (α0 = (−10.5, −9.5), α1 = 4, α2 = 1.5, α3 = 0.5, α4 = 0), we first calculated casual probabilities (πi) for all considered p=122,745 SNPs by Equation 2, where α0 was chosen to ensure the total number of true causal SNPs fall in (5, 10) with α0 = −10.5 or (15, 30) with α0 = −9.5. Second, a vector of binary indicator (γi) of true causal SNPs was generated from the corresponding Bernoulli distribution with probability (πi) for i = 1, …, p. Third, genetic effect sizes were taken as 0 for SNPs with γi = 0 and generated from a normal distribution for SNPs with γi = 1. Finally, phenotypes were generated from Equation 1 with simulated genetic effect sizes and random errors ϵ∼N(0, (1 − h2)I), to ensure a target total heritability h2 = (0.25, 0.5) equally explained by all true causal SNPs. A total of four scenarios were considered, including one with relatively sparse true causals in the range of (5, 10) and one with the number of true causals in the range of (15, 30), with respect to two different heritability values (0.25, 0.5).

In addition, we considered a null enrichment scenario where none of the annotations were enriched. In this scenario, we randomly selected 10 true causal SNPs and assigned them genetic effect sizes generated from a normal distribution. We simulated the phenotype as described above with a targeted h2=0.5.

We repeated 100 simulations per scenario to evaluate our Bayesian estimates for annotation enrichment, total heritability, and true causal SNPs with respect to sensitivity (power) and positive predictive values (PPV)43. Sensitivity (power) is defined as the proportion of true positive findings among all true casual variants, and PPV is defined as the proportion of true positive findings among all identified significant associations. We took simulated true casual SNPs and those having R2 > 0.3 with true casual SNPs with Embedded Image as true positive findings, following the significance rule used by the BFGWAS_CAT method22. The sensitivity and PPV are given by Embedded Image

We compared with the standard BVSR method44 that does not account for functional annotations. We estimated the total heritability by the squared correlation between the simulated phenotypes and the PRSs based on Bayesian estimates of genetic effect sizes Embedded Image of SNPs with Embedded Image, Embedded Image

Applications to ROS/MAP individual-level GWAS data

ROS/MAP are two prospective longitudinal community-based cohort studies, which recruit older adults without known dementia at baseline and follow participants up annually until the time of death33; 41; 45. Participants all agreed to annual clinical evaluation and brain autopsy at the time of death, signing an informed consent and Anatomic Gift Act. All participants in this study also signed a repository consent to allow their data to be re-purposed. WGS data were profiled for 1200 samples by using the KAPA Hyper Library Preparation Kit and Illumina HiSeq X sequencer.

We applied BFGWAS_QUANT to account for five eQTL based and five histone modifications based quantitative annotations as described above to study five AD related phenotypes9; 32-34, including the binary clinical diagnosis of late-onsite Alzheimer’s dementia (n=1087), three quantified postmortem pathology indices of AD (i.e., PHFtau tangle density with n=1105, β-amyloid load with n=1113, and a global measurement of AD pathology burden with n=1123), and a quantitative measurement of cognition decline rate with n=104933; 41. The cognition decline rate was constructed as the random slope per sample from a linear mixed model of annual longitudinal measurements of cognition function. Details about ROS/MAP and phenotypic traits were measured can be found in previously published paper46. We also adjusted for the covariates of age, sex, smoking status, study index (ROS or MAP), and first 3 genotype principal components, by regressing these covariates out from the phenotypes and taking the corresponding regression residuals as the outcome in the BFGWAS_QUANT method.

Applications to IGAP summary-level GWAS data

We applied BFGWAS_QUANT to the stage 1 summary-level GWAS data of the International Genomics of Alzheimer’s Project (IGAP)47, along with above 10 eQTL and histone modifications based functional annotations and reference LD generated from ROS/MAP. The stage 1 IGAP summary-level GWAS data were generated by meta-analyses consisting of 17,008 AD cases and 37,154 controls of European ancestry (n=∼54K).

AD risk prediction by PRS

To show the usefulness of PRS for risk prediction, we evaluated two sets of PRS that were respectively derived from BFGWAS_QUANT and BVSR summary statistics using ROS/MAP GWAS data. We used the independent test samples from Mayo Clinic Alzheimer’s Disease Genetics Studies (MCADGS)48; 49. MCADGS contain 2,099 European-decent samples (844 AD cases and 1,255 controls) with microarray genotype data profiled that were further imputed to the 1000 Genome Project Phase50.

We compared PRS using Bayesian effect size estimates with three commonly used PRS methods, the standard method using informed LD-Pruning and P-value thresholding (P+T)51; 52, LDpred253, and PRS-CS54. P-value thresholds of (10−2, 10−3, 10−5, 5 × 10−8, 10−8) and LD thresholds of (0.1, 0.3, 0.5, 0.7,0.9) were considered by the P+T method. Reference LD derived from 1000 Genome data were used by LDpred and PRS-CS methods. AD risk prediction accuracy was evaluated using the area under the Receiver Operating Characteristic (ROC)55 curve (AUC) for MCADGC test samples.

Ethics statement

ROS/MAP and MCADGS data analyzed in this study were generated under the improvement by the Institutional Review Board (IRB) of Rush University Medical Center, Chicago, IL and Mayo Clinic, respectively. All samples analyzed in this study were de-identified and all analyses were approved by the IRB of Emory University School of Medicine.

Results

Simulation results

For all considered simulation scenarios, our Bayesian estimates of functional annotation enrichment achieved convergence with 4 EM iterations, as shown in boxplots of 100 simulation replicates (Figure 1 (A, D); and Supplemental Figure 1). Although BFGWAS_QUANT overestimated α1 and α2, the Bayesian estimates still reflect the correct enrichment pattern among all considered annotations. By taking the 2.5th and 97.5th quantiles of these 100 Bayesian estimates to estimate the corresponding 2.5th and 97.5th quantiles of the estimator distributions, the true enrichment values (α1 = 4, α2 = 1.5, α3 = 0.5, α4 = 0) indeed fell within this range. For example, in the scenario with α0 = −10.5 and h2 = 0.25 (Figure 1A), the estimated 2.5th and 97.5th quantiles are (1.62, 5.50) for α1, (0.98, 4.57) for α2, (0.02, 1.20) for α3, and (0.00, 0.38) for α4. We observed precise estimates of 0 enrichment for the artificial annotation (Figure 1; Supplemental Figure 1) and all annotations in the scenario with null enrichment (Supplement Figure 3A), which demonstrated the ability of BFGWAS_QUANT to identify null enrichment.

Figure 1.
  • Download figure
  • Open in new tab
Figure 1. Bayesian enrichment estimates, heritability estimates, and sensitivities of simulation studies.

Panels (A, B, C) are for simulations with α0 = −10.5 and panels (D, E, F) are for simulations with α0 = −9.5. Bayesian estimates of annotation enrichment (α1, α2, α3, α4) out of 100 simulations with true heritability h2 = 0.25 are shown in the respective box plot (panels A, D), where red dots denote true enrichment values. Comparable heritability estimates (panels B, E) and higher sensitivities (panels C, F) were obtained by BFGWAS_QUANT (red) vs. BVSR (blue).

By taking PRS as estimated phenotypes and taking the squared correlation between RPS and simulated phenotypes as the estimate of phenotype heritability, we obtained similar heritability estimates by BFGWAS_QUANT and BVSR, which are close to the true heritability (Figure 1 (B, E); Supplemental Figure 3B). For scenarios with true enrichment, BFGWAS_QUANT obtained substantially higher sensitivity (power) while similar positive predictive values (PPV) compared to BVSR for all scenarios (Figure 1 (C, F); Supplemental Figure 2; Supplementary Figures 4-5). For the scenario with null enrichment, both BFGWAS_QUANT and BVSR performed comparably (Supplementary Figure 3 (B, C, D)).

These simulation studies validated BFGWAS_QUANT for quantifying multivariate functional annotations, estimating phenotype heritability, and identifying true causal SNPs. By accounting for multivariate quantitative annotations, BFGWAS_QUANT showed improved performance than the standard BVSR method, especially with higher power for identifying true causal SNPs and accurate enrichment estimation.

Application GWAS results for studying AD

Applying BFGWAS_QUANT to the individual-level ROS/MAP and summary-level IGAP GWAS data, we obtained consistent patterns for Bayesian enrichment estimates of 5 eQTL based and 5 histone modifications based functional annotations (Figure 2; Supplement Figures 6-7). In particular, the histone modifications based functional annotations had higher enrichment than eQTL based annotations when studying the individual-level ROS/MAP GWAS data, with the highest enrichment for H3K27me3 and second highest for H3K4me1. Interestingly, BFGWAS_QUANT estimated the second highest enrichment for the Microglia_eQTL annotation when the IGAP GWAS data with a large sample size (n=∼54K). Even with a small sample size in the individual-level ROS/MAP GWAS data, the Microglia_eQTL annotation was still identified with higher enrichment than other annotations based on eQTL of the bulk brain frontal cortex tissue. There is mounting evidence showing that Microglia (composing <10% cells in the bulk brain frontal cortex tissues) plays important roles in the development and progression of AD pathology56, and cell-type specific differential expression of GWAS risk genes of AD are only present in Microglia57.

Figure 2.
  • Download figure
  • Open in new tab
Figure 2. Bayesian estimates of functional annotation enrichment for Alzheimer’s dementia by using ROS/MAP individual-level (A) and IGAP summary-level (B) GWAS data.

Histone modification H3K27me3 (polycomb regression) and Microglia cis-eQTL annotations were found to be most enriched for association signals of AD.

Using the ROSMAP individual-level GWAS data, four significant SNPs with Bayesian CPP > 0.1068 were identified for AD (rs429358, rs10414043, rs769449 and rs7256200) by BFGWAS_QUANT (Table 1; Figure 3A; Supplemental Figures 8-9). In particular, SNP rs429358 (CPP = 0.144, p-value = 7.72 × 10−13, missense variant) is the famous known APOE E4 risk allele of AD58 and have significant Bayesian CPP > 0.1068 for all 5 AD related phenotypes. SNPs rs10414043 (CPP = 0.111, p-value = 2.71 × 10−12) and rs7256200 (CPP = 0.315, p-value = 2.71 × 10−12, regulatory variant) are upstream of the known risk gene APOC1 of AD and blood protein traits37; 59; 60. Besides the missense APOE E4 risk alleles rs429358 and rs7412, one additional significant SNP rs1065853 is identified upstream of APOC1 for global AD pathology, which is a known GWAS signal for blood protein traits such as low-density lipoprotein60; 61. Out of 11 significant SNPs identified for at least one AD related phenotype (Table 1), 2 of which are intergenic and the other 9 SNPs are intron, regulatory, missense, and stop gained variants.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 1. Significant SNPs with Bayesian CPP > 0. 1068 by BFGWAS_QUANT for studying AD related phenotypes using the ROS/MAP individual-level GWAS data.

SNPs with single variant test P-value <5 × 10−8 were shaded in gray.

Figure 3.
  • Download figure
  • Open in new tab
Figure 3. Manhattan plots of BFGWAS_QUANT results for studying Alzheimer’s dementia using ROS/MAP individual-level (A) and IGAP summary-level (B) GWAS data.

Single variant test p-values were plotted in -log10 scale in the y-axis. The dashed horizontal line denotes the genome-wide significant threshold 5 × 10−8. SNPs with Bayesian CPP > 0.1068 were colored according to the color scale of their Bayesian CPP values by BFGWAS_QUANT. SNPs with Bayesian CPP >0.5 were plotted in solid triangles.

Using the IGAP summary-level GWAS data, BFGWAS_QUANT fine-mapped 32 significant SNPs with Bayesian CPP > 0.1068 associated with AD (Table 2; Figure 3B). Multiple SNPs located in genes PVRL2, APOE/TOMM40, APOC1 in chromosome 19 were found to be associated with AD with CPP = 1. Interestingly, 10 out of 32 significant SNPs are cis-eQTL of Microglia, and all significant SNPs except one intergenic SNP (rs7110631) are intron, regulatory, downstream, upstream, 3’ UTR, and missense variants (Table 2). All significant SNPs except rs78959900 are located in the peak regions of histone modification H3K27me3 (polycomb regression) that has the highest enrichment. Further, several SNPs that did not pass the genome-wide significance threshold (P-value < 5 × 10−8) were identified by integrating 10 functional annotations in BFGWAS_QUANT. These SNPs are located in genes that were previously found to be genetically linked to AD, such as HLA-DRB1, NUP160, SLC24A4 and CD3347; 58; 62-65.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 2. Significant SNPs with Bayesian CPP > 0. 1068 by BFGWAS_QUANT for studying AD using the IGAP summary-level GWAS data.

SNPs with single variant test P-value <5 × 10−8 were shaded in gray.

Additionally, the summation of genome-wide Bayesian CPP of SNPs with CPP >0.01 can be used to estimate the number of total causal SNPs for the phenotype of interest. The threshold of CPP>0.01 is used to exclude adding CPP from random MCMC selections. Although the power is limited for analyzing the ROS/MAP individual-level GWAS data with a small sample size, BFGWAS_QUANT estimated a total of 54 potential causal SNPs for AD using the IGAP summary-level GWAS data (Table 3).

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 3. Estimates of total causal SNPs.

The summations of the Bayesian CPP estimates of SNPs with CPP>0.01 estimate the total number of causal SNPs.

AD risk prediction by PRS in MCADGS

To show the usefulness of BFGWAS_QUANT for studying complex traits and diseases, we derived PRS using the Bayesian effect size estimates by BFGWAS_QUANT for an independent GWAS cohort MCADGS (n = 2099), and compared the risk prediction accuracy with the PRS using effect size estimates by BVSR, P+T, LDpred2, and PRS-CS. When the ROS/MAP individual-level GWAS data were used as training data, comparable AUC was obtained by using Bayesian effect size estimates byBFGWAS_QUANT (0.69) and BVSR (0.68), which was similar to the one obtained by LDpred2-auto (0.68) but significantly higher than the ones obtained by P+T (0.55), LDpred2-inf (0.53), and PRS-CS (0.54) (Figure 4A). This showed the advantage of deriving PRS using Bayesian effect size estimates by BFGWAS_QUANT when individual-level GWAS data are available, especially when the sample size is small. When the IGAP summary-level GWAS data were used as training data, the AUC by BFGWAS_QUANT (0.75) was the same as by LDpred2-auto (0.75), but much lower than P + T (0.88 with p-value threshold 10−3), LDpred2-inf (0.94), and PRS-CS (0.93) (Figure 4B). These results show that an infinitesimal model66 as assumed by PRS-CS and LDpred2-inf is more suitable for PRS development than the sparse model assumed by BFGWAS_QUANT and LDpred2-auto.

Figure 4.
  • Download figure
  • Open in new tab
Figure 4. ROC plots comparing prediction accuracy of Alzheimer’s dementia in the test data of MCDGC by PRSs derived using the ROS/MAP individual-level (A) and IGAP summary-level (B) GWAS data as training data.

PRS derived using Bayesian effect size estimates by BFGWAS_QUANT has comparable prediction accuracy as the PRSs derived by BVSR and LDpred2 auto, for all assuming a sparse causal model. PRSs derived by PRS-CS and LDpred2-inf using IGAP summary-level GWAS data as training data have the highest prediction accuracy for assuming an infinitesimal causal model.

Discussion

We developed a scalable Bayesian functional GWAS method to account for multivariate quantitative functional annotations (BFGWAS_QUANT) for studying complex traits and diseases, which is based on a Bayesian hierarchical variable selection regression model and accompanied with a scalable EM-MCMC computation algorithm. BFGWAS_QUANT has the advantages of quantifying enrichment of functional annotations, as well as modeling LD to generate fine-mapped GWAS results that are also prioritized based on their functional annotations. BFGWAS_QUANT can be applied to both individual-level and summary-level GWAS data. In particular, the Bayesian effect size estimates can be used to derive PRS that accounts for functional annotations.

Our simulation studies validated the performance of BFGWAS_QUANT with respect to annotation enrichment quantification, GWAS association identification, and heritability estimation. Compared to BVSR, BFGWAS_QUANT method had higher sensitivities (i.e., power), while comparable PPV and accuracy of phenotype heritability estimation.

In the real studies of AD related phenotypes using both ROS/MAP individual-level and IGAP summary-level GWAS data, we showed that interesting enrichment patterns were identified, fine-mapped GWAS signals were identified, and predictive PRSs were derived. In particular, we found the histone modification H3K27me3 (polycomb regression) and Microglia cis-eQTL annotations were most enriched for association signals of AD. We also showed that SNPs with single variant test p-values < 5 × 10−8 could be identified for being prioritized due to their functional annotations.

Despite the above advantages, BFGWAS_QUANT does have its limitations. First, BFGWAS_QUANT model is developed for quantitative traits. However, following previous studies22; 67, GWAS analysis can still be done for dichotomous traits by quantifying cases as 1’s and controls as 0’s, which will have similar performance as a probit model if samples are independent and population structure can be addressed by top genotype principal components. Extending the BFGWAS_QUANT method for studying dichotomous traits is also part of our ongoing research. Second, BFGWAS_QUANT assumes the summary-level GWAS data and reference LD were derived from populations of the same ancestry. One need to apply BFGWAS_QUANT per ancestry first and then meta-analyze the results for studying GWAS cohorts with multiple ancestries. Third, BFGWAS_QUANT assumes a sparse causal genetic architecture which is suitable for generating fine-mapped GWAS results but might lack of power for deriving PRS.

In conclusion, our work demonstrated the usefulness of integrating multivariate quantitative functional annotations in GWAS for quantifying the enrichment of multiple functional annotations and generating fine-mapped GWAS results with higher power. Specifically, accurate quantification of annotation enrichment would help prioritize GWAS signals (fine-mapping) and then help illustrate the underlying genomic etiology of complex traits and diseases. Especially, as publicly available molecular QTL based and epigenomic features functional annotations continue to grow, BFGWAS_QUANT provides a convenient tool for integrative multi-omics studies.

Data Availability

All data produced in the present study are available upon reasonable request to the authors

https://www.radc.rush.edu/

Supplemental Data

Supplemental Note includes 9 Supplemental Figures and Supplemental Methods with details of the model derivations.

Declaration of Interests

The authors declare no competing interests. David A. Bennett receives consulting fee from B4X inc for advising on neurodegenerative disease.

Web Resources

BFGWAS_QUANT Github directory, https://github.com/yanglab-emory/BFGWAS_QUANT

RADC Research Resource Sharing Hub, http://www.radc.rush.edu/

ROS/MAP data, https://www.synapse.org/#!Synapse:syn3219045

MCADGC data, https://www.synapse.org/#!Synapse:syn2910256

LDpred, https://github.com/bvilhjal/ldpred

PRS-CS, https://github.com/getian107/PRScs

Data and Code Availability

ROS/MAP data can be requested through Rush Alzheimer’s Disease Center (http://www.radc.rush.edu/) and synapse (https://www.synapse.org/#!Synapse:syn3219045). MCADGS data can be requested through synapse (https://www.synapse.org/#!Synapse:syn2910256). IGAP summary statistics are available from http://web.pasteur-lille.fr/en/recherche/u744/igap/igap_download.php. Source code of BFGWAS_QUANT is available through Github, https://github.com/yanglab-emory/BFGWAS_QUANT.

Acknowledgements

J.Y. is supported by NIH/NIGMS grant R35GM138313 and NIH/NIA grant R21AG070659. ROS/MAP study data were provided by the Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, IL. Data collection was supported through funding by NIA grants P30AG10161, R01AG15819, R01AG17917, R01AG30146, R01AG36836, U01AG32984, U01AG46152, U01AG61356, the Illinois Department of Public Health, and the Translational Genomics Research Institute. The MCADGC led by Dr. Nilüfer Ertekin-Taner and Dr. Steven G. Younkin, Mayo Clinic, Jacksonville, FL uses samples from the Mayo Clinic Study of Aging, the Mayo Clinic Alzheimer’s Disease Research Center, and the Mayo Clinic Brain Bank. MCADGC data collection was supported through funding by NIA grants P50 AG016574, R01 AG032990, U01 AG046139, R01 AG018023, U01 AG006576, U01 AG006786, R01 AG025711, R01 AG017216, R01 AG003949, NINDS grant R01 NS080820, CurePSP Foundation, and support from Mayo Foundation.

Reference

  1. 1.↵
    Visscher, P.M., Wray, N.R., Zhang, Q., Sklar, P., McCarthy, M.I., Brown, M.A., and Yang, J. (2017). 10 Years of GWAS Discovery: Biology, Function, and Translation. Am J Hum Genet 101, 5–22.
    OpenUrlCrossRefPubMed
  2. 2.↵
    Welter, D., MacArthur, J., Morales, J., Burdett, T., Hall, P., Junkins, H., Klemm, A., Flicek, P., Manolio, T., Hindorff, L., et al. (2014). The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 42, D1001–1006.
    OpenUrlCrossRefPubMedWeb of Science
  3. 3.↵
    McCarthy, M.I., Abecasis, G.R., Cardon, L.R., Goldstein, D.B., Little, J., Ioannidis, J.P.A., and Hirschhorn, J.N. (2008). Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Reviews Genetics 9, 356–369.
    OpenUrlCrossRefPubMedWeb of Science
  4. 4.↵
    Hindorff, L.A., Sethupathy, P., Junkins, H.A., Ramos, E.M., Mehta, J.P., Collins, F.S., and Manolio, T.A. (2009). Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences 106, 9362.
    OpenUrlAbstract/FREE Full Text
  5. 5.↵
    Gallagher, M.D., and Chen-Plotkin, A.S. (2018). The Post-GWAS Era: From Association to Function. Am J Hum Genet 102, 717–730.
    OpenUrlCrossRefPubMed
  6. 6.↵
    Rentzsch, P., Witten, D., Cooper, G.M., Shendure, J., and Kircher, M. (2019). CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res 47, D886–D894.
    OpenUrlCrossRefPubMed
  7. 7.↵
    Roadmap Epigenomics, C., Kundaje, A., Meuleman, W., Ernst, J., Bilenky, M., Yen, A., Heravi-Moussavi, A., Kheradpour, P., Zhang, Z., Wang, J., et al. (2015). Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330.
    OpenUrlCrossRefPubMed
  8. 8.↵
    Consortium, E.P. (2011). A user’s guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol 9, e1001046.
    OpenUrlCrossRefPubMed
  9. 9.↵
    Consortium, G. (2020). The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330.
    OpenUrlAbstract/FREE Full Text
  10. 10.↵
    McVicker, G., van de Geijn, B., Degner, J.F., Cain, C.E., Banovich, N.E., Raj, A., Lewellen, N., Myrthil, M., Gilad, Y., and Pritchard, J.K. (2013). Identification of genetic variants that affect histone modifications in human cells. Science 342, 747–749.
    OpenUrlAbstract/FREE Full Text
  11. 11.↵
    Ernst, J., and Kellis, M. (2010). Discovery and characterization of chromatin states for systematic annotation of the human genome. Nature biotechnology 28, 817–825.
    OpenUrlCrossRefPubMedWeb of Science
  12. 12.↵
    Robins, C., Liu, Y., Fan, W., Duong, D.M., Meigs, J., Harerimana, N.V., Gerasimov, E.S., Dammer, E.B., Cutler, D.J., Beach, T.G., et al. (2021). Genetic control of the human brain proteome. American journal of human genetics 108, 400–410.
    OpenUrl
  13. 13.↵
    Banovich, N.E., Lan, X., McVicker, G., van de Geijn, B., Degner, J.F., Blischak, J.D., Roux, J., Pritchard, J.K., and Gilad, Y. (2014). Methylation QTLs are associated with coordinated changes in transcription factor binding, histone modifications, and gene expression levels. PLoS Genet 10, e1004663.
    OpenUrlCrossRefPubMed
  14. 14.
    Waszak, S.M., Delaneau, O., Gschwind, A.R., Kilpinen, H., Raghav, S.K., Witwicki, R.M., Orioli, A., Wiederkehr, M., Panousis, N.I., Yurovsky, A., et al. (2015). Population Variation and Genetic Control of Modular Chromatin Architecture in Humans. Cell 162, 1039–1050.
    OpenUrlCrossRefPubMed
  15. 15.↵
    Heyn, H. (2016). Quantitative Trait Loci Identify Functional Noncoding Variation in Cancer. PLoS Genet 12, e1005826.
    OpenUrl
  16. 16.↵
    Hormozdiari, F., Kostem, E., Kang, E.Y., Pasaniuc, B., and Eskin, E. (2014). Identifying causal variants at loci with multiple signals of association. Genetics 198, 497–508.
    OpenUrlAbstract/FREE Full Text
  17. 17.
    Le, K.T.T., Matzaraki, V., Netea, M.G., Wijmenga, C., Moser, J., and Kumar, V. (2019). Functional Annotation of Genetic Loci Associated With Sepsis Prioritizes Immune and Endothelial Cell Pathways. Front Immunol 10, 1949.
    OpenUrlCrossRef
  18. 18.
    Matzaraki, V., Gresnigt, M.S., Jaeger, M., Ricaño-Ponce, I., Johnson, M.D., Oosting, M., Franke, L., Withoff, S., Perfect, J.R., Joosten, L.A.B., et al. (2017). An integrative genomics approach identifies novel pathways that influence candidaemia susceptibility. PLoS One 12, e0180824.
    OpenUrl
  19. 19.
    Ruffieux, H., Fairfax, B.P., Nassiri, I., Vigorito, E., Wallace, C., Richardson, S., and Bottolo, L. (2021). EPISPOT: An epigenome-driven approach for detecting and interpreting hotspots in molecular QTL studies. Am J Hum Genet 108, 983–1000.
    OpenUrl
  20. 20.↵
    Nicolae, D.L., Gamazon, E., Zhang, W., Duan, S., Dolan, M.E., and Cox, N.J. (2010). Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS genetics 6, e1000888–e1000888.
    OpenUrl
  21. 21.↵
    Ng, B., White, C.C., Klein, H.U., Sieberts, S.K., McCabe, C., Patrick, E., Xu, J., Yu, L., Gaiteri, C., Bennett, D.A., et al. (2017). An xQTL map integrates the genetic architecture of the human brain’s transcriptome and epigenome. Nat Neurosci 20, 1418–1426.
    OpenUrlCrossRefPubMed
  22. 22.↵
    Yang, J., Fritsche, L.G., Zhou, X., Abecasis, G., and International Age-Related Macular Degeneration Genomics, C. (2017). A Scalable Bayesian Method for Integrating Functional Information in Genome-wide Association Studies. American journal of human genetics 101, 404–416.
    OpenUrlCrossRef
  23. 23.
    Pickrell, J.K. (2014). Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. American journal of human genetics 94, 559–573.
    OpenUrlCrossRefPubMed
  24. 24.
    Trynka, G., Westra, H.-J., Slowikowski, K., Hu, X., Xu, H., Stranger, B.E., Klein, R.J., Han, B., and Raychaudhuri, S. (2015). Disentangling the Effects of Colocalizing Genomic Annotations to Functionally Prioritize Non-coding Variants within Complex-Trait Loci. Am J Hum Genet 97, 139–152.
    OpenUrlCrossRefPubMed
  25. 25.
    Iversen, E.S., Lipton, G., Clyde, M.A., and Monteiro, A.N.A. (2014). Functional annotation signatures of disease susceptibility loci improve SNP association analysis. BMC Genomics 15, 398–398.
    OpenUrl
  26. 26.↵
    Chen, G.K., and Witte, J.S. (2007). Enriching the analysis of genomewide association studies with hierarchical modeling. American journal of human genetics 81, 397–404.
    OpenUrlCrossRefPubMedWeb of Science
  27. 27.↵
    Kichaev, G., Yang, W.-Y., Lindstrom, S., Hormozdiari, F., Eskin, E., Price, A.L., Kraft, P., and Pasaniuc, B. (2014). Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS genetics 10, e1004722–e1004722.
    OpenUrl
  28. 28.↵
    Kichaev, G., Roytman, M., Johnson, R., Eskin, E., Lindström, S., Kraft, P., and Pasaniuc, B. (2016). Improved methods for multi-trait fine mapping of pleiotropic risk loci. Bioinformatics 33, 248–255.
    OpenUrl
  29. 29.↵
    Ma, Y., and Wei, P. (2019). FunSPU: A versatile and adaptive multiple functional annotation-based association test of whole-genome sequencing data. PLoS genetics 15, e1008081.
    OpenUrl
  30. 30.↵
    Li, X., Li, Z., Zhou, H., Gaynor, S.M., Liu, Y., Chen, H., Sun, R., Dey, R., Arnett, D.K., Aslibekyan, S., et al. (2020). Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nature genetics 52, 969–983.
    OpenUrlCrossRefPubMed
  31. 31.↵
    Guan, Y.T., and Stephens, M. (2011). Bayesian Variable Selection Regression for Genome-Wide Association Studies and Other Large-Scale Problems. Annals of Applied Statistics 5, 1780–1815.
    OpenUrl
  32. 32.↵
    De Jager, P.L., Ma, Y., McCabe, C., Xu, J., Vardarajan, B.N., Felsky, D., Klein, H.U., White, C.C., Peters, M.A., Lodgson, B., et al. (2018). A multi-omic atlas of the human frontal cortex for aging and Alzheimer’s disease research. Sci Data 5, 180142.
    OpenUrl
  33. 33.↵
    Bennett, D.A., Buchman, A.S., Boyle, P.A., Barnes, L.L., Wilson, R.S., and Schneider, J.A. (2018). Religious Orders Study and Rush Memory and Aging Project. J Alzheimers Dis 64, S161–S189.
    OpenUrlPubMed
  34. 34.↵
    Hormozdiari, F., Gazal, S., van de Geijn, B., Finucane, H.K., Ju, C.J., Loh, P.R., Schoech, A., Reshef, Y., Liu, X., O’Connor, L., et al. (2018). Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits. Nature genetics 50, 1041–1047.
    OpenUrlCrossRefPubMed
  35. 35.↵
    Bryois, J., Calini, D., Macnair, W., Foo, L., Urich, E., Ortmann, W., Iglesias, V.A., Selvaraj, S., Nutma, E., Marzin, M., et al. (2021). Cell-type specific cis-eQTLs in eight brain cell-types identifies novel risk genes for human brain disorders. medRxiv, 2021.2010.2009.21264604.
  36. 36.↵
    Lopes, K.P., Snijders, G.J.L., Humphrey, J., Allan, A., Sneeboer, M.A.M., Navarro, E., Schilder, B.M., Vialle, R.A., Parks, M., Missall, R., et al. (2022). Genetic analysis of the human microglial transcriptome across brain regions, aging and disease pathologies. Nature genetics 54, 4–17.
    OpenUrl
  37. 37.↵
    Luningham, J.M., Chen, J., Tang, S., De Jager, P.L., Bennett, D.A., Buchman, A.S., and Yang, J. (2020). Bayesian Genome-wide TWAS Method to Leverage both cis- and trans-eQTL Information through Summary Statistics. American journal of human genetics 107, 714–726.
    OpenUrlCrossRef
  38. 38.↵
    Broyden, C.G. (1970). The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations. IMA Journal of Applied Mathematics 6, 76–90.
    OpenUrlCrossRef
  39. 39.↵
    Lopes, K.d.P., Snijders, G.J.L., Humphrey, J., Allan, A., Sneeboer, M.A.M., Navarro, E., Schilder, B.M., Vialle, R.A., Parks, M., Missall, R., et al. (2022). Genetic analysis of the human microglial transcriptome across brain regions, aging and disease pathologies. Nature Genetics 54, 4–17.
    OpenUrl
  40. 40.↵
    Quinlan, A.R., and Hall, I.M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842.
    OpenUrlCrossRefPubMedWeb of Science
  41. 41.↵
    Bennett, D.A., Schneider, J.A., Arvanitakis, Z., and Wilson, R.S. (2012). Overview and findings from the religious orders study. Curr Alzheimer Res 9, 628–645.
    OpenUrlCrossRefPubMedWeb of Science
  42. 42.↵
    Wang, M., Beckmann, N.D., Roussos, P., Wang, E., Zhou, X., Wang, Q., Ming, C., Neff, R., Ma, W., Fullard, J.F., et al. (2018). The Mount Sinai cohort of large-scale genomic, transcriptomic and proteomic data in Alzheimer’s disease. Sci Data 5, 180185.
    OpenUrl
  43. 43.↵
    Trevethan, R. (2017). Sensitivity, Specificity, and Predictive Values: Foundations, Pliabilities, and Pitfalls in Research and Practice. Front Public Health 5, 307.
    OpenUrl
  44. 44.↵
    Guan, Y., and Stephens, M. (2011). Bayesian variable selection regression for genome-wide association studies and other large-scale problems. Ann Appl Stat 5, 1780–1815.
    OpenUrlCrossRef
  45. 45.↵
    Bennett, D.A., Schneider, J.A., Buchman, A.S., Barnes, L.L., Boyle, P.A., and Wilson, R.S. (2012). Overview and findings from the rush Memory and Aging Project. Curr Alzheimer Res 9, 646–663.
    OpenUrlCrossRefPubMedWeb of Science
  46. 46.↵
    Bennett, D.A., Wilson, R.S., Boyle, P.A., Buchman, A.S., and Schneider, J.A. (2012). Relation of neuropathology to cognition in persons without cognitive impairment. Ann Neurol 72, 599–609.
    OpenUrlCrossRefPubMed
  47. 47.↵
    Lambert, J.-C., Ibrahim-Verbaas, C.A., Harold, D., Naj, A.C., Sims, R., Bellenguez, C., Jun, G., DeStefano, A.L., Bis, J.C., Beecham, G.W., et al. (2013). Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nature Genetics 45, 1452–1458.
    OpenUrlCrossRefPubMed
  48. 48.↵
    Carrasquillo, M.M., Zou, F., Pankratz, V.S., Wilcox, S.L., Ma, L., Walker, L.P., Younkin, S.G., Younkin, C.S., Younkin, L.H., Bisceglio, G.D., et al. (2009). Genetic variation in PCDH11X is associated with susceptibility to late-onset Alzheimer’s disease. Nature genetics 41, 192–198.
    OpenUrlCrossRefPubMedWeb of Science
  49. 49.↵
    Allen, M., Carrasquillo, M.M., Funk, C., Heavner, B.D., Zou, F., Younkin, C.S., Burgess, J.D., Chai, H.S., Crook, J., Eddy, J.A., et al. (2016). Human whole genome genotype and transcriptome data for Alzheimer’s and other neurodegenerative diseases. Sci Data 3, 160089.
    OpenUrl
  50. 50.↵
    1000 Genomes Project Consortium. (2012). An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65.
    OpenUrlCrossRefPubMedWeb of Science
  51. 51.↵
    Choi, S.W., Mak, T.S., and O’Reilly, P.F. (2020). Tutorial: a guide to performing polygenic risk score analyses. Nat Protoc 15, 2759–2772.
    OpenUrlPubMed
  52. 52.↵
    Euesden, J., Lewis, C.M., and O’Reilly, P.F. (2015). PRSice: Polygenic Risk Score software. Bioinformatics (Oxford, England) 31, 1466–1468.
    OpenUrlCrossRefPubMed
  53. 53.↵
    Prive, F., Arbel, J., and Vilhjalmsson, B.J. (2020). LDpred2: better, faster, stronger. Bioinformatics.
  54. 54.↵
    Ge, T., Chen, C.Y., Ni, Y., Feng, Y.A., and Smoller, J.W. (2019). Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat Commun 10, 1776.
    OpenUrl
  55. 55.↵
    Linden, A. (2006). Measuring diagnostic and predictive accuracy in disease management: an introduction to receiver operating characteristic (ROC) analysis. J Eval Clin Pract 12, 132–139.
    OpenUrlCrossRefPubMedWeb of Science
  56. 56.↵
    Hansen, D.V., Hanson, J.E., and Sheng, M. (2018). Microglia in Alzheimer’s disease. J Cell Biol 217, 459–472.
    OpenUrlAbstract/FREE Full Text
  57. 57.↵
    Mathys, H., Davila-Velderrain, J., Peng, Z., Gao, F., Mohammadi, S., Young, J.Z., Menon, M., He, L., Abdurrob, F., Jiang, X., et al. (2019). Single-cell transcriptomic analysis of Alzheimer’s disease. Nature 570, 332–337.
    OpenUrl
  58. 58.↵
    Kunkle, B.W., Grenier-Boley, B., Sims, R., Bis, J.C., Damotte, V., Naj, A.C., Boland, A., Vronskaya, M., van der Lee, S.J., Amlie-Wolf, A., et al. (2019). Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing. Nature genetics 51, 414–430.
    OpenUrlCrossRefPubMed
  59. 59.↵
    Takei, N., Miyashita, A., Tsukie, T., Arai, H., Asada, T., Imagawa, M., Shoji, M., Higuchi, S., Urakami, K., Kimura, H., et al. (2009). Genetic association study on in and around the APOE in late-onset Alzheimer disease in Japanese. Genomics 93, 441–448.
    OpenUrlCrossRefPubMedWeb of Science
  60. 60.↵
    Wojcik, G.L., Graff, M., Nishimura, K.K., Tao, R., Haessler, J., Gignoux, C.R., Highland, H.M., Patel, Y.M., Sorokin, E.P., Avery, C.L., et al. (2019). Genetic analyses of diverse populations improves discovery for complex traits. Nature 570, 514–518.
    OpenUrlCrossRefPubMed
  61. 61.↵
    Andaleon, A., Mogil, L.S., and Wheeler, H.E. (2018). Gene-based association study for lipid traits in diverse cohorts implicates BACE1 and SIDT2 regulation in triglyceride levels. PeerJ 6, e4314.
    OpenUrlCrossRef
  62. 62.↵
    Naj, A.C., Jun, G., Beecham, G.W., Wang, L.-S., Vardarajan, B.N., Buros, J., Gallins, P.J., Buxbaum, J.D., Jarvik, G.P., Crane, P.K., et al. (2011). Common variants at MS4A4/MS4A6E, CD2AP, CD33 and EPHA1 are associated with late-onset Alzheimer’s disease. Nature Genetics 43, 436–441.
    OpenUrlCrossRefPubMed
  63. 63.
    Karch, C.M., Jeng, A.T., Nowotny, P., Cady, J., Cruchaga, C., and Goate, A.M. (2012). Expression of Novel Alzheimer’s Disease Risk Genes in Control and Alzheimer’s Disease Brains. PLOS ONE 7, e50976.
    OpenUrlCrossRefPubMed
  64. 64.
    Lu, R.C., Yang, W., Tan, L., Sun, F.R., Tan, M.S., Zhang, W., Wang, H.F., and Tan, L. (2017). Association of HLA-DRB1 polymorphism with Alzheimer’s disease: a replication and meta-analysis. Oncotarget 8, 93219–93226.
    OpenUrlCrossRef
  65. 65.↵
    Novikova, G., Kapoor, M., Tcw, J., Abud, E.M., Efthymiou, A.G., Chen, S.X., Cheng, H., Fullard, J.F., Bendl, J., Liu, Y., et al. (2021). Integration of Alzheimer’s disease genetics and myeloid genomics identifies disease risk regulatory elements and genes. Nature Communications 12, 1610.
    OpenUrl
  66. 66.↵
    Nelson, R.M., Pettersson, M.E., and Carlborg, O. (2013). A century after Fisher: time for a new paradigm in quantitative genetics. Trends Genet 29, 669–676.
    OpenUrlCrossRefPubMed
  67. 67.↵
    Zhou, X., Carbonetto, P., and Stephens, M. (2013). Polygenic modeling with bayesian sparse linear mixed models. PLoS genetics 9, e1003264.
    OpenUrl
Back to top
PreviousNext
Posted August 17, 2022.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Scalable Bayesian functional GWAS method accounting for multivariate quantitative functional annotations with applications to studying Alzheimer’s disease
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Scalable Bayesian functional GWAS method accounting for multivariate quantitative functional annotations with applications to studying Alzheimer’s disease
Junyu Chen, Lei Wang, Philip L. De Jager, David A. Bennett, Aron S. Buchman, Jingjing Yang
medRxiv 2022.08.12.22278704; doi: https://doi.org/10.1101/2022.08.12.22278704
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Scalable Bayesian functional GWAS method accounting for multivariate quantitative functional annotations with applications to studying Alzheimer’s disease
Junyu Chen, Lei Wang, Philip L. De Jager, David A. Bennett, Aron S. Buchman, Jingjing Yang
medRxiv 2022.08.12.22278704; doi: https://doi.org/10.1101/2022.08.12.22278704

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetic and Genomic Medicine
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)