Inter- and intra-chromosomal modulators of the APOE ε2 and ε4 effects on the Alzheimer’s disease risk ========================================================================================================= * Alireza Nazarian * Ian Philipp * Irina Culminskaya * Liang He * Alexander M. Kulminski ## Abstract The mechanisms of incomplete penetrance of risk modifying impacts of apolipoprotein E (*APOE*) ε2 and ε4 alleles on Alzheimer’s disease (AD) have not been fully understood. We performed genome-wide analysis of differences in linkage disequilibrium (LD) patterns between 6136 AD-affected and 10555 AD-unaffected subjects from five independent studies to explore whether the association of the *APOE* ε2 allele (encoded by rs7412 polymorphism) and ε4 allele (encoded by rs429358 polymorphism) with AD was modulated by autosomal polymorphisms. The LD analysis identified 24 (mostly inter-chromosomal) and 57 (primarily intra-chromosomal) autosomal polymorphisms with significant differences in LD with either rs7412 or rs429358, respectively, between AD-affected and AD-unaffected subjects, indicating their potential modulatory roles. Our Cox regression analysis showed that minor alleles of four inter-chromosomal and ten intra-chromosomal polymorphisms exerted significant modulating effects on the ε2- and ε4-associated AD risks, respectively, and identified ε2-independent (rs2884183 polymorphism, 11q22.3) and ε4-independent (rs483082 polymorphism, 19q13.32) associations with AD. Our functional analysis highlighted ε2- and/or ε4-linked processes affecting the lipid and lipoprotein metabolism, and cell junction organization which may contribute to AD pathogenesis. These findings provide insights into the ε2- and ε4-associated mechanisms of AD pathogenesis, underlying their incomplete penetrance. Keywords * Dementia * Aging * LD * Cox Regression * Compound Genotype * Genetic Heterogeneity ## Introduction The apolipoprotein E (*APOE*) gene is the strongest Alzheimer’s disease (AD)-associated genetic factor [1–3], which can explain 13.4% of phenotypic variance and 25.2% of genetic variance of AD [4]. Minor alleles of the exonic single-nucleotide polymorphisms (SNPs) rs429358 and rs7412 in the *APOE* gene encode the ε4 and ε2 alleles, respectively. The ε2 allele is considered as a protective factor against AD, whereas the ε4 allele is advocated to be a major variant predisposing to AD [3,5]. The *APOE* gene encodes a lipoprotein mainly involved in lipid transfer and metabolism. Nevertheless, its functional impacts are not limited to lipid profile alterations and related vasculopathies [6]. The *APOE* involvement in AD pathogenesis has been widely studied, revealing various molecular and biological processes differentially impacted by different *APOE* alleles. For instance, the ε4 allele has been linked to increased production and decreased clearance of β-amyloid, stress-mediated increased tau hyperphosphorylation, accelerated cortical atrophy (e.g., in the medial temporal lobe), baseline neuronal hyperactivity (e.g., in the hippocampus), reduced cerebral glucose metabolism, damaged synaptic structure and function, increased cytoskeletal and mitochondrial dysfunction, and abnormal hippocampal neurogenesis [7]. Despite strong associations between *APOE* and AD, neither the ε2 nor ε4 allele is considered as a causal factor for AD development [5,8–10]. Addressing the mechanisms of actions of the ε2 and ε4 alleles is essential for understanding AD pathogenesis and AD risk assessment. The complex regional interactions and haplotype structures in the *APOE* locus (19q13.3) have been emphasized by a growing body of studies [11–19]. These studies indicate the potential roles of nearby polymorphisms in modulating the impacts of the *APOE* alleles on AD risks in the form of haplotypes and combinations of genotypes (called compound genotypes). The analyses of haplotypes leverage the idea that AD can be affected by haplotypes driven by genetic linkage between nearby SNPs [20]. The functional linkage may drive, however, compound genotypes consisting of not only local but also distant variants [21]. In this study, we used a comprehensive approach to examine intra- (cis-acting) and inter- (trans-acting) chromosomal modulators of the impacts of the *APOE* rs7412 or rs429358 SNPs on the AD risk in the ε4- or ε2-negative sample. We leveraged samples of the AD-affected (N=6136) and unaffected (N=10555) subjects from five studies: (i) to perform a comparative analysis of LD between rs7412 or rs429358 and other autosomal SNPs in the human genome in the AD-affected and unaffected subjects, (ii) to examine AD risks for carriers of compound genotypes comprised of rs7412 or rs429358 and the identified intra- and inter-chromosomal SNPs in LD with them, and (iii) to identify biological functions and diseases enriched by genes harboring these SNPs. ## Methods ### Study Participants We used data on subjects of European ancestry from (Table S1): three National Institute on Aging (NIA) Alzheimer’s Disease Centers data (ADCs) from the Alzheimer’s Disease Genetics Consortium (ADGC) initiative [22], whole-genome sequencing (WGS) data from the Alzheimer’s Disease Sequencing Project (ADSP-WGS) [23,24], Cardiovascular Health Study (CHS) [25], Framingham Heart Study (FHS) [26,27], and NIA Late-Onset Alzheimer’s Disease Family Based Study (LOAD FBS) [28]. The ADSP-WGS’s subjects who were also present in other datasets were excluded to make datasets independent. The *APOE* genotypes were either directly reported by original studies (ADGC, ADSP-WGS, FHS) or were determined based on the rs429358 and rs7412 genotypes (CHS and LOAD FBS). The diagnoses of AD cases in the five analyzed datasets were mainly based on the neurologic exams [29,30], and the AD status was reported either directly (ADGC, ADSP-WGS, FHS, LOAD FBS) or in the form of ICD-9 (International Classification of Disease codes, ninth revision) codes (CHS). ### Genotype Data and Quality Control (QC) We used whole-genome sequencing (ADSP-WGS) and genome-wide data from different array-based platforms (ADGC, CHS, FHS, LOAD FBS). SNPs were first imputed to harmonize them across analyzed datasets [31]. Low-quality data were excluded using *PLINK* [32] as follows: 1) SNPs and subjects with missing rates >5%, 2) SNPs with minor allele frequencies (MAF) <5%, 3) SNPs deviated from Hardy-Weinberg with P<1E-06, and 4) SNPs, subjects, and/or families with Mendel error rates >2% (in ADSP-WGS, FHS, and LOAD FBS which include families). In addition, imputed SNPs with r2<0.7 were filtered out (ADGC, CHS, FHS, LOAD FBS). Selecting SNPs presented at least in one study resulted in a set of 1,645,025 SNPs for the analysis. ### Two-stage LD Analysis #### Design Our analyses were performed separately in stratified samples obtained by dividing each dataset into four groups based on the *APOE* genotypes and AD status. First, we determined ε4-negative (ε2ε2, ε2ε3, and ε3ε3 genotypes) and ε2-negative (ε4ε4, ε3ε4, and ε3ε3 genotypes) subsamples. Then each subsample was divided into AD-affected and unaffected groups (herein referred to as AD and NAD groups, respectively). We evaluated LD between the *APOE* rs7412 or rs429358 SNP and each SNP in the genome in two stages. #### Stage 1: LD Analysis in Individual and pooled Datasets We examined LD (i.e., r statistics) using the haplotype-based method [33–35] in each of the four selected subsamples in each dataset individually and combined. The statistically significant LD estimates were determined using a conservative chi-square test *χ**2**=r**2**n* [35], where *n* is the number of subjects rather than gametes to address the uncertainty in inferring haplotypes from unphased genetic data [16,18,36,37]. The variances of the r statistics were calculated using the asymptotic variance method detailed in [37]. The LD analysis was performed using *haplo*.*stats* r package [38]. Stage 1 provided two sets of SNPs in LD with the *APOE* SNPs in each subsample. The first set was generated following the discovery-replication strategy (herein referred to as replication set). In this case SNPs were selected if their LD with the *APOE* SNP attained: 1) genome-wide (P < 5E-08) or suggestive-effect (5E-08 ≤ P < 5E-06) significance in any of the five datasets, which was considered as a discovery set, and 2) Bonferroni-adjusted P<0.0125 (=0.05/4, where 4 is the number of potential replication sets) in at least one of the other four datasets [31]. The second set included SNPs in significant LD with the *APOE* SNPs at genome-wide or suggestive significance in the pooled samples of all five datasets that were not in the replication set. #### Stage 2: Group-Specific LD We examined whether SNPs identified in stage 1 had group-specific LD by contrasting *r* between pooled AD and NAD groups, *Δr*=*r**AD*-*r**NAD*, using a permutation test [39,40]. Significant *Δr* indicated SNPs in group-specific LD with rs7412 or rs429358. Bonferroni-adjusted thresholds, accounting for the number of tested SNPs, were used to identify significant findings. ### Analysis of the AD risk For each group-specific SNP, survival-type analysis was performed to examine the impact of a compound genotype variable (CompG) on the AD risk. The CompG included four compound genotypes comprised of rs7412 or rs429358 genotypes and genotypes of a group-specific SNP (Table 1). View this table: [Table 1.](http://medrxiv.org/content/early/2022/06/17/2022.06.16.22276523/T1) Table 1. Compound genotype constructed based on the genotypes at rs7412 or rs429358 and the identified group-specific SNPs. We fitted the Cox regression model (*coxme* and *survival* R packages [41,42]), considering the age at onset of AD as a time variable. We used sex, the top five principal components of genetic data and ADC cohorts (in ADGC) as fixed-effects covariates, and family IDs (LOAD FBS, FHS, ADSP-WGS) as a random-effects covariate. The results from five datasets were combined through inverse-variance meta-analysis using *GWAMA* package [43]. The CompG1 compound genotype was the reference factor level. We used a chi-square test with one degree of freedom [44] to estimate the significance of the difference between the effect sizes for CompG3 and CompG4: ![Formula][1] Here, *b**CompG3* (*se**CompG3*) and *b**CompG4* (*se**compG4*) are the beta coefficients (standard errors) corresponding to the CompG3 and CompG4 genotypes in the Cox model, respectively. Significant findings were identified at the Bonferroni-adjusted levels correcting for the numbers of ε2- and ε4-associated group-specific SNPs. ### Functional enrichment analysis *The Database for Annotation, Visualization and Integrated Discovery (DAVID)* [45] and *Metascape* [46] web tools were used to identify gene-enriched *REACTOME* pathways [47] and *DisGeNET* diseases [48]. The analysis was performed for genes harboring SNPs in group-specific LD with rs7412 or rs429358 separately. We used false discovery rate (FDR) adjusted significance cut off at PFDR<0.05 [49] to identify significantly enriched terms by two or more genes. ## Results ### SNPs in LD with rs7412 (*APOE* ε2 allele) In stage 1, we found that 306 SNPs mapped to 27 loci were in LD with rs7412 at P<5E-06 in the AD group (21 SNPs in 9 loci, Table S2), the NAD group (198 SNPs in 20 loci, Table S3), and both AD and NAD groups (87 SNPs, all in the *APOE* locus, Table S4). Of them, we identified LD of rs7412 with 58 SNPs not in the *APOE* locus (or other loci on chromosome 19) in the AD (19 SNPs in 8 loci) or NAD (39 SNPs in 19 loci) groups. For most SNPs, 219 of 306, the magnitudes of LD (i.e., |r|) were smaller in the pooled AD than NAD group (181 of 248 SNPs in the *APOE* locus and 38 of 58 inter-chromosomal SNPs). We also observed that the *r* signs were the same in these two groups for 272 of 306 SNPs. In stage 2, we found 24 SNPs (Table S5) having group-specific LD with rs7412 at a Bonferroni-adjusted significance P<1.63E-04 (=0.05/306). Of them, 16 SNPs were mapped to 6 non-*APOE* loci. All of them were identified in the pooled sample of either the AD (14 SNPs) or NAD (2 SNPs) group. LD estimates for 14 of these 16 SNPs were characterized by opposite signs of *r* in these groups (Figure 1). Also, 15 of them had larger magnitudes of *r* in the AD group than NAD group. The remaining 8 SNPs were in the *APOE* locus, of which only rs11669338 (*NECTIN2*) attained significance only in NAD group, whereas all the others in both groups. All 8 SNPs had the same signs of *r* in the AD and NAD groups, whose magnitudes were smaller in the AD than NAD group (Figure 1). ![Figure 1.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2022/06/17/2022.06.16.22276523/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2022/06/17/2022.06.16.22276523/F1) Figure 1. Linkage disequilibrium *r* between the identified group-specific SNPs and rs7412 in the ε4-negative sample of all five datasets combined. The x-axis shows SNP identifiers, genes harboring these SNPs, and chromosomes. Blue boxes: Alzheimer’s disease-affected group (AD). Red boxes: AD-unaffected group (NAD). The vertical lines show 95% confidence intervals. ### SNPs in LD with rs429358 (*APOE* ε4 allele) In stage 1, we found that rs429358 was in LD with 801 SNPs (143 loci) at P<5E-06 in the AD group (301 SNP in 73 loci, Table S6), the NAD group (351 SNP in 81 loci, Table S7), and both AD and NAD groups (149 SNP; all in the *APOE* locus, except 2 SNPs, Table S8). In the AD and NAD groups, we identified LD of rs429358 with 159 (72 loci) and 344 (80 loci) SNPs not in the *APOE* region, respectively, totaling 503 SNPs. Of all 505 SNPs (154 loci) not in the *APOE* locus in AD, NAD, and AD&NAD groups, one locus harboring *FXYD5* and *FAM187B* genes (11 SNPs, NAD group) was on chromosome 19, and the other 494 SNPs (153 loci) were not on chromosome 19. The LD magnitudes were smaller in the pooled AD than NAD group for 370 of 801 SNPs (270 of 296 SNPs in the *APOE* locus and 161 of 505 SNPs in the non-*APOE* loci). The *r* signs were the same in these two groups for 711 of 801 SNPs. In stage 2, we identified 57 SNPs with group-specific LD at a Bonferroni-adjusted significance P<6.24E-05 (=0.05/801). As seen in Table S9, 17 of 57 SNPs were mapped to 11 non-*APOE* loci. All of them were identified in the pooled sample of either the AD (10 SNPs) or NAD (7 SNPs) group. The magnitudes of *r* were larger in the pooled AD than NAD sample for SNPs whose significant LD was identified in the AD group and vice versa. The *r* signs for 13 of these 17 SNPs were opposite in these AD and NAD samples. The other 40 SNPs were located in the *APOE* locus. Magnitudes of *r* for all SNPs, except rs769449 (*APOE*), were larger in the pooled AD than NAD sample. For all SNPs, except rs11083767 (*EXOC3L2*), the *r* signs were the same in these AD and NAD samples (Figure 2). ![Figure 2.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2022/06/17/2022.06.16.22276523/F2.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2022/06/17/2022.06.16.22276523/F2) Figure 2. Linkage disequilibrium (LD) *r* between the identified group-specific SNPs and rs429358 in the ε2-negative sample of all five datasets combined. (A) LD for inter-chromosomal SNPs, i.e., SNPs not on chromosome 19. (B) LD for intra-chromosomal SNPs. The x-axis shows SNP identifiers, genes harboring these SNPs, and chromosomes in Figure A. Blue boxes: Alzheimer’s disease-affected group (AD). Red boxes: AD-unaffected group (NAD). The vertical lines show 95% confidence intervals. ### AD risk for carriers of compound genotypes We performed Cox regression analysis to examine the impact of compound genotypes comprised of a group-specific SNP and either rs7412 (Tables 2 and S10, Figure 3A) or rs429358 (Tables 2 and S11, Figure 3B) on the AD risk. An advantage of using compound genotypes is that we can explicitly examine the effect of a minor allele of a group-specific SNP independently of the effect of the ε2 or ε4 allele (CompG2), the impact of the ε2 or ε4 allele independently of the minor allele of that SNP (CompG3), and combined effects of these minor alleles (CompG4) in the same model with the same reference genotype (CompG1) (Table 1). View this table: [Table 2.](http://medrxiv.org/content/early/2022/06/17/2022.06.16.22276523/T2) Table 2. Bonferroni-adjusted significant results from the survival-type meta-analysis of compound genotype (CompG) associations with Alzheimer’s disease risk using SNPs in group-specific LD with rs7412 or rs429358. ![Figure 3.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2022/06/17/2022.06.16.22276523/F3.medium.gif) [Figure 3.](http://medrxiv.org/content/early/2022/06/17/2022.06.16.22276523/F3) Figure 3. The results of the meta-analysis of the associations of compound genotypes comprised of SNPs (shown on the *x*-axis) in group-specific linkage disequilibrium with (A) rs7412 in the ε4-negative sample or (B) rs429358 in the ε2-negative sample with the Alzheimer’s disease risk. CompG2 (green) indicates ε3ε3 subjects carrying at least one minor allele of the SNP; CompG3 (red) denotes (A) ε2 or (B) ε4 carriers having major allele homozygotes of the SNP; CompG4 (blue) indicates (A) ε2 or (B) ε4 carriers having at least one minor allele of the SNP. CompG1 indicating the ε3ε3 subjects carrying major allele homozygotes of the SNP was the reference. Black vertical lines show 95% confidence intervals (negative direction for rs769449 was truncated for better resolution). The x-axis shows SNP identifiers, genes harboring these SNPs, and chromosomes. One asterisk (*) indicates nominally significant differences in the effects between CompG3 and CompG4 at (A) 2.08E-03≤P<0.05 and (B) 8.77E-04≤P<0.05. Two asterisks (**) indicate Bonferroni-adjusted significance in those differences at (A) P<2.08E-03 and (B) P<8.77E-04. No asterisk indicates non-significant differences in Figure (A). Figure (B) shows only 17 group-specific SNPs for which the differences in the effects between CompG3 and CompG4 attained P<0.05. #### AD risk for carriers of 24 rs7412-bearing compound genotypes (Tables 2 and S10, Figure 3A) Our analysis showed that none of eight CompG2 genotypes bearing SNPs from the *APOE* locus attained Bonferroni-adjusted significance PBε2=2.08E-03 (=0.05/24), although rs405509 minor allele was beneficially associated with AD, independently of ε2, at nominal significance P=0.0238. In contrast, six of 16 CompG2 genotypes comprised of rs7412 and non-*APOE* locus SNPs were beneficially associated with AD at the nominal significance (PBε2≤P<0.05). For one CompG2, we observed beneficial association of rs2884183 minor allele (11q22.3, *DDX10*) with AD at P