Structural Variants in Linkage Disequilibrium with GWAS-Significant SNPs ======================================================================== * Hao Liang * Joni Sedillo * Steven J. Schrodi * Akihiro Ikeda ## Abstract **Summary** With the recent expansion of structural variant identification in the human genome, understanding the role of these impactful variants in disease architecture is critically important. Currently, a large proportion of genome-wide-significant GWAS SNPs are functionally unresolved, raising the possibility that some of these SNPs are associated with disease through linkage disequilibrium with causal structural variants. Hence, understanding the linkage disequilibrium between newly discovered structural variants and statistically significant SNPs may provide a resource for further investigation into disease-associated regions in the genome. Here we present a resource cataloging structural variant-significant SNP pairs in high linkage disequilibrium. **Availability** All data files including those detailing SV and GWAS SNP associations and results of GWAS-SNP-SV pairs are available at [https://github.com/hliang-SchrodiLab/SV\_SNPs](https://github.com/hliang-SchrodiLab/SV_SNPs). Large genomic imbalances can significantly disrupt important functional elements in the human genome including chromatin structure, noncoding RNAs, protein-coding sequence, and gene regulation.1-3 These changes can substantially alter phenotypes and potentially drive a wide-array of disease and disease-related traits. These changes can therefore drive substantial phenotypic effects, including those effects that impact the risk of disease. Indeed, structural variants (SVs) have been associated with several clinical traits including schizophrenia4, cardiometabolic physiology5, amyotrophic lateral sclerosis6, low density lipoprotein levels7, and neurodevelopmental disorders8. However, short-read sequencing technology has limited ability to detect SVs.9 To address this limitation, a study of 32 human genomes was recently conducted by the Human Genome Structural Variation Consortium (HGSVC) using a combination of long-read PacBio whole genome sequencing and Strand-seq technologies, identifying 107,590 SVs across the genome.10 The authors noted that 68% of these SVs were not discovered by short-read sequencing. In our study, we sought to determine which single nucleotide polymorphisms (SNPs), exceeding genome-wide significance levels in genome-wide association studies (GWAS) were in high linkage disequilibrium with newly identified SVs. By doing so, these results generate specific hypotheses concerning SVs as causal variants which could drive correlated SNPs to exhibit strong disease association. Hence, this work can serve as a resource to investigate disease-association at SVs not interrogated in GWAS which may be driving disease signals. To construct the database of GWAS-significant SNPs (p<5E-08) in high linkage disequilibrium with these new SVs, SNPs were obtained from the GWAS catalog11 noting the position (assembly GRch38/hg38) and alleles. SV location and alleles were downloaded from the HGSVC data portal.12 To reduce the computational effort, a restriction was placed on the physical distance between GWAS SNPs and SVs prior to calculating linkage disequilibrium. This distance was set to 100kbp flanking the endpoints of the SV. Using the unphased data, linkage disequilibrium was then calculated on the HGSVC samples using the approach described below. GWAS SNP-SV pairs that exceeded a squared correlation coefficient of 0.80 were included in the database. Perl (v5.32.1) code was used to preprocess original files, including file format conversion, GWAS association extraction and sample counting. Code written in R (v4.1.3) was used to calculate linkage disequilibrium and p-values. Bedtools (v2.30.0) was also used to extract SNPs located within the regions of SVs and their upstream/downstream 100kbp flanking regions. To calculate the pairwise linkage disequilibrium between each SV and nearby GWAS-significant SNPs, an estimator of the squared correlation coefficient was used on unphased data. The SVs and SNPs studied are biallelic. Denote the pair of alleles segregating at an SV as *A*1 and *A*2. Similarly denote the pair of alleles segregating at a SNP as *B*1 and *B*2. Further denote the number of the nine double genotypes in a sample of individuals as: ![Formula][1] Let ![Graphic][2]. Setting the numerical values for each individual carrying a specific genotype as ![Formula][3] we then define the genotypic squared correlation coefficient (Pearson’s correlation coefficient squared) as ![Formula][4] Notably, the value of *g*2 is equivalent to the standard metric *r*2 under Hardy-Weinberg equilibrium. Our analysis produced 16,238 GWAS-SNP-SV pairs in high linkage disequilibrium (g2 > 0.80) across 2,355 traits from the GWAS catalog within the physical distance window. These SNP-SV pairs were composed of a total of 4,677 unique SVs and 7,831 unique GWAS-significant SNPs. The distribution of numbers of high linkage disequilibrium SNP-SV pairs for each chromosome is shown in **Supplemental Table 1**. To exemplify the utility of this resource, we show the findings for nine GWAS SNPs within the ±100kbp flanking region of SV10995 (insertion/deletion of 384 nucleotides) in the *PLEKHA1/ARMS2/HTRA1* region on chromosome 10q26 (**Figure 1**). Six of these SNPs, rs11200638, rs3793917, rs3750846, rs3750847, rs3750848, and rs10490924 were all previously found to be highly associated with age-related macular degeneration (AMD).13-15 Four of these six SNPs were found to be in perfect linkage disequilibrium with SV10995 within the HGSVC sample set and rs11200638 and rs3793917 exhibited a *g*2 = 0.94. Outside of AMD, rs61871747 was suggestively associated with cognitive function (p=9.13E-07), rs36212732 was found to significantly correlate with refractive error, and rs61871744 was significantly associated with cataract. SV10995 resides immediately (11bp (gene body) / 431bp(cds)) downstream of *ARMS2*. It is possible that SV10995 affects the expression of ARMS2, which may, in turn, modify the risk for AMD. Interestingly, a different SV16 was found to both be associated with AMD and have substantial effects on *ARMS2* mRNA stability.17 View this table: [Supplemental Table 1.](http://medrxiv.org/content/early/2022/12/16/2022.12.14.22283482/T1) Supplemental Table 1. The number of genome-wide significant SNPs in high linkage disequilibrium (*g*2 > 0.80) with a structural variant listed by chromosome. ![Figure 1.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2022/12/16/2022.12.14.22283482/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2022/12/16/2022.12.14.22283482/F1) Figure 1. *Example: SV10995* and Significant GWAS SNPs in High Linkage Disequilibrium Figure 1 shows the region from chr10:122357364-122557747 (hg38 assembly) showing SV10995 (insertion/deletion structural variant), the ten SNPs associated with disease traits from the GWAS Catalog, the squared correlation (*g*2) between each SNP and SV10995, and the p-value reported in the respective publication for each SNP. This information is displayed positionally in the context of *PLEKHA1*, the microRNA *MIR3941*, ENSG00000285955, *ARMS2*, and *HTRA1*. Through the creation of this repository of disease-associated SNPs that are highly correlated with newly discovered SVs, this archive can serve as an important resource for fine mapping causal variants in medically important traits. ## Data Availability All data files including those detailing SV and GWAS SNP associations and results of GWAS SNP-SV pairs are available at [https://github.com/hliang-SchrodiLab/SV\_SNPs](https://github.com/hliang-SchrodiLab/SV_SNPs) [https://github.com/hliang-SchrodiLab/SV\_SNPs](https://github.com/hliang-SchrodiLab/SV_SNPs) * Received December 14, 2022. * Revision received December 14, 2022. * Accepted December 16, 2022. * © 2022, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NoDerivs 4.0 International), CC BY-ND 4.0, as described at [http://creativecommons.org/licenses/by-nd/4.0/](http://creativecommons.org/licenses/by-nd/4.0/) ## References 1. 1.Chiang C, Scott AJ, Davis JR, Tsang EK, et al. (2017) The impact of structural variation on human gene expression. Nat Genet 49:692–699. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3834&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28369037&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F16%2F2022.12.14.22283482.atom) 2. 2.Scott AJ, Chiang C, Hall IM (2021) Structural variants are a major source of gene expression differences in humans and often affect multiple nearby genes. Genome Res 31:2249–2257. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiZ2Vub21lIjtzOjU6InJlc2lkIjtzOjEwOiIzMS8xMi8yMjQ5IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMTIvMTYvMjAyMi4xMi4xNC4yMjI4MzQ4Mi5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 3. 3.Shanta O, Noor A, Human Genome Structural Variation Consortium (HGSVC), Sebat J (2020) The effects of common structural variants on 3D chromatin structure. BMC Genomics 21(1):95. 4. 4.Halvorsen M, Huh R, Oskolkov N, Wen J, et al. (2020) Increased burden of ultra-rare structural variants localizing to boundaries of topologically associated domains in schizophrenia. Nat Comm 11:1842. 5. 5.Chen L, Abel HJ, Das I, Larson DE, et al. (2021) Association of structural variation with cardiometabolic traits in Finns. Am J Hum Genet 108(4):583–596. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2021.03.008&link_type=DOI) 6. 6.Al Khleifat A, Iacoangeli A, van Vugt JJFA, Bowles H, et al. (2022) Structural variation analysis of 6,500 whole genome sequences in amyotrophic lateral sclerosis. NPJ Genom Med 7(1):8. 7. 7.Beyter D, Ingimundardottir H, Oddsson A, Eggertsson HP, et al. (2021) Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits. Nat Genet 53:779–786. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-021-00865-4&link_type=DOI) 8. 8.Talkowski ME, Rosenfeld JA, Blumenthal I, Pillalamarri V, et al. (2012) Sequencing chromosomal abnormalities reveals neurodevelopmental loci that confer risk across diagnostic boundaries. Cell 149:525–537. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.cell.2012.03.028&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22521361&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F16%2F2022.12.14.22283482.atom) 9. 9.De Coster W and Van Broeckhoven C (2019) Newest methods for detecting structural variations. Trends Biotechnol 37(9):973–982. 10. 10.Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, et al. (2021) Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372(6537):eabf7117. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjE3OiIzNzIvNjUzNy9lYWJmNzExNyI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzEyLzE2LzIwMjIuMTIuMTQuMjIyODM0ODIuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 11. 11.[https://www.ebi.ac.uk/gwas/home](https://www.ebi.ac.uk/gwas/home) 12. 12.ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC2www.internationalgenome.org/data-portal/data-collection/hgsvc2 13. 13.Fritsche LG, Chen W, Schu M, Yaspan BL, et al. (2013) Seven new loci associated with age-related macular degeneration. Nat Genet 45(4):433–439. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.2578&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23455636&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F16%2F2022.12.14.22283482.atom) 14. 14.Guindo-Martinez M, Amela R, Bonas-Guarch S, Puiggros M, et al. (2021) The impact of non-additive genetic associations on age-related complex diseases. Nat Commun 12(1):2436. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-021-21952-4&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33893285&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F16%2F2022.12.14.22283482.atom) 15. 15.Chen W, Stambolian D, Edwards AO, Branham KE, et al. (2010) Genetic variants near TIMP3 and high-density lipoprotein-associated loci influence susceptibility to age-related macular degeneration. Proc Natl Acad Sci USA 107(16):7401–7406. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMToiMTA3LzE2Lzc0MDEiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMi8xMi8xNi8yMDIyLjEyLjE0LjIyMjgzNDgyLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 16. 16.NG\_011725.1:g.7643\_8086delinsTTATTAATTAATTAACTAAAATTAAATTATTTAGTTAATTTAATTAAC TAAACT 17. 17.Fritsche LG, Leonhardt T, Jassen A, Fisher SA, et al. (2008) Age-related macular degeneration is associated with an unstable ARMS2 (LOC387715) mRNA. Nat Genet 40:892–896. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.170&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18511946&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F16%2F2022.12.14.22283482.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000257166500025&link_type=ISI) 18. 18.Dewan A, Liu M, Hartman S, Zhang SS, et al. (2006) HTRA1 promoter polymorphism in wet agerelated macular degeneration. Science 14(5801):989–992. 19. 19.Fritsche LG, Igl W, Bailey JN, Sengupta S, et al. (2016) A large genome-wide association study of age-related macular degeneration highlights contributions of rare and common variants. Nat Genet 48(2):134–143. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3448&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26691988&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F16%2F2022.12.14.22283482.atom) 20. 20.Winkler TW, Grassmann F, Brandl C, Kiel C, et al. (2020) Genome-wide association meta-analysis for early age-related macular degeneration highlights novel loci and insights for advanced disease. BMC Med Genomics 13(1):120. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F16%2F2022.12.14.22283482.atom) 21. 21.Hysi PG, Choquet H, Khawaja AP, Wojciechowski R, et al. (2020) Meta-analysis of 542,934 subjects of European ancestry identifies new genes and mechanisms predisposing to refractive error and myopia. Nat Genet 52(4):401–407. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-020-0599-0&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32231278&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F16%2F2022.12.14.22283482.atom) 22. 22.Naj AC, Scott WK, Courtenay MD, Cade WH, et al. (2013) Genetic factors in nonsmokers with age-related macular degeneration revealed through genome-wide gene-environment interaction analysis. Ann Hum Genet 77(3):215–231. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/ahg.12011&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23577725&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F16%2F2022.12.14.22283482.atom) 23. 23.Han X, Ong JS, An J, Craig JE, et al. (2020) Association of Myopia and Intraocular Pressure With Retinal Detachment in European Descent Participants of the UK Biobank Cohort: A Mendelian Randomization Study. JAMA Ophthalmol 138(6):671–678. 24. 24.Sobrin L, Ripke S, Yu Y, Fagerness J, et al. (2012) Heritability and genome-wide association study to assess genetic differences between advanced age-related macular degeneration subtypes. Ophthalmology 119(9):1874–1885. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ophtha.2012.03.014&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22705344&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F16%2F2022.12.14.22283482.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000310581200023&link_type=ISI) 25. 25.Mohammadnejad A, Nygaard M, Li S, Zhang D, et al. (2020) Generalized correlation coefficient for genome-wide association analysis of cognitive ability in twins. Aging (Albany NY) 12(22):22457–22494. 26. 26.Sakaue S, Kanai M, Tanigawa Y, Karjalainen J, et al. (2021) A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet 53(10):1415–1424. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41588-021-00931-x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F16%2F2022.12.14.22283482.atom) [1]: /embed/graphic-1.gif [2]: /embed/inline-graphic-1.gif [3]: /embed/graphic-2.gif [4]: /embed/graphic-3.gif