Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Structural Variants in Linkage Disequilibrium with GWAS-Significant SNPs

Hao Liang, Joni Sedillo, View ORCID ProfileSteven J. Schrodi, Akihiro Ikeda
doi: https://doi.org/10.1101/2022.12.14.22283482
Hao Liang
1Department of Medical Genetics, University of Wisconsin-Madison, Madison, WI, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joni Sedillo
1Department of Medical Genetics, University of Wisconsin-Madison, Madison, WI, USA
2Computation and Informatics in Biology and Medicine, University of Wisconsin-Madison, Madison, WI, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Steven J. Schrodi
1Department of Medical Genetics, University of Wisconsin-Madison, Madison, WI, USA
2Computation and Informatics in Biology and Medicine, University of Wisconsin-Madison, Madison, WI, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Steven J. Schrodi
  • For correspondence: schrodi{at}wisc.edu aikeda{at}wisc.edu
Akihiro Ikeda
1Department of Medical Genetics, University of Wisconsin-Madison, Madison, WI, USA
3McPherson Eye Research Institute, University of Wisconsin-Madison, Madison, WI, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: schrodi{at}wisc.edu aikeda{at}wisc.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Summary With the recent expansion of structural variant identification in the human genome, understanding the role of these impactful variants in disease architecture is critically important. Currently, a large proportion of genome-wide-significant GWAS SNPs are functionally unresolved, raising the possibility that some of these SNPs are associated with disease through linkage disequilibrium with causal structural variants. Hence, understanding the linkage disequilibrium between newly discovered structural variants and statistically significant SNPs may provide a resource for further investigation into disease-associated regions in the genome. Here we present a resource cataloging structural variant-significant SNP pairs in high linkage disequilibrium.

Availability All data files including those detailing SV and GWAS SNP associations and results of GWAS-SNP-SV pairs are available at https://github.com/hliang-SchrodiLab/SV_SNPs.

Large genomic imbalances can significantly disrupt important functional elements in the human genome including chromatin structure, noncoding RNAs, protein-coding sequence, and gene regulation.1-3 These changes can substantially alter phenotypes and potentially drive a wide-array of disease and disease-related traits. These changes can therefore drive substantial phenotypic effects, including those effects that impact the risk of disease. Indeed, structural variants (SVs) have been associated with several clinical traits including schizophrenia4, cardiometabolic physiology5, amyotrophic lateral sclerosis6, low density lipoprotein levels7, and neurodevelopmental disorders8. However, short-read sequencing technology has limited ability to detect SVs.9 To address this limitation, a study of 32 human genomes was recently conducted by the Human Genome Structural Variation Consortium (HGSVC) using a combination of long-read PacBio whole genome sequencing and Strand-seq technologies, identifying 107,590 SVs across the genome.10 The authors noted that 68% of these SVs were not discovered by short-read sequencing. In our study, we sought to determine which single nucleotide polymorphisms (SNPs), exceeding genome-wide significance levels in genome-wide association studies (GWAS) were in high linkage disequilibrium with newly identified SVs. By doing so, these results generate specific hypotheses concerning SVs as causal variants which could drive correlated SNPs to exhibit strong disease association. Hence, this work can serve as a resource to investigate disease-association at SVs not interrogated in GWAS which may be driving disease signals.

To construct the database of GWAS-significant SNPs (p<5E-08) in high linkage disequilibrium with these new SVs, SNPs were obtained from the GWAS catalog11 noting the position (assembly GRch38/hg38) and alleles. SV location and alleles were downloaded from the HGSVC data portal.12 To reduce the computational effort, a restriction was placed on the physical distance between GWAS SNPs and SVs prior to calculating linkage disequilibrium. This distance was set to 100kbp flanking the endpoints of the SV. Using the unphased data, linkage disequilibrium was then calculated on the HGSVC samples using the approach described below. GWAS SNP-SV pairs that exceeded a squared correlation coefficient of 0.80 were included in the database.

Perl (v5.32.1) code was used to preprocess original files, including file format conversion, GWAS association extraction and sample counting. Code written in R (v4.1.3) was used to calculate linkage disequilibrium and p-values. Bedtools (v2.30.0) was also used to extract SNPs located within the regions of SVs and their upstream/downstream 100kbp flanking regions.

To calculate the pairwise linkage disequilibrium between each SV and nearby GWAS-significant SNPs, an estimator of the squared correlation coefficient was used on unphased data. The SVs and SNPs studied are biallelic. Denote the pair of alleles segregating at an SV as A1 and A2. Similarly denote the pair of alleles segregating at a SNP as B1 and B2. Further denote the number of the nine double genotypes in a sample of individuals as: Embedded Image

Let Embedded Image.

Setting the numerical values for each individual carrying a specific genotype as Embedded Image we then define the genotypic squared correlation coefficient (Pearson’s correlation coefficient squared) as Embedded Image

Notably, the value of g2 is equivalent to the standard metric r2 under Hardy-Weinberg equilibrium.

Our analysis produced 16,238 GWAS-SNP-SV pairs in high linkage disequilibrium (g2 > 0.80) across 2,355 traits from the GWAS catalog within the physical distance window. These SNP-SV pairs were composed of a total of 4,677 unique SVs and 7,831 unique GWAS-significant SNPs. The distribution of numbers of high linkage disequilibrium SNP-SV pairs for each chromosome is shown in Supplemental Table 1. To exemplify the utility of this resource, we show the findings for nine GWAS SNPs within the ±100kbp flanking region of SV10995 (insertion/deletion of 384 nucleotides) in the PLEKHA1/ARMS2/HTRA1 region on chromosome 10q26 (Figure 1). Six of these SNPs, rs11200638, rs3793917, rs3750846, rs3750847, rs3750848, and rs10490924 were all previously found to be highly associated with age-related macular degeneration (AMD).13-15 Four of these six SNPs were found to be in perfect linkage disequilibrium with SV10995 within the HGSVC sample set and rs11200638 and rs3793917 exhibited a g2 = 0.94. Outside of AMD, rs61871747 was suggestively associated with cognitive function (p=9.13E-07), rs36212732 was found to significantly correlate with refractive error, and rs61871744 was significantly associated with cataract. SV10995 resides immediately (11bp (gene body) / 431bp(cds)) downstream of ARMS2. It is possible that SV10995 affects the expression of ARMS2, which may, in turn, modify the risk for AMD. Interestingly, a different SV16 was found to both be associated with AMD and have substantial effects on ARMS2 mRNA stability.17

View this table:
  • View inline
  • View popup
  • Download powerpoint
Supplemental Table 1.

The number of genome-wide significant SNPs in high linkage disequilibrium (g2 > 0.80) with a structural variant listed by chromosome.

Figure 1.
  • Download figure
  • Open in new tab
Figure 1. Example: SV10995 and Significant GWAS SNPs in High Linkage Disequilibrium

Figure 1 shows the region from chr10:122357364-122557747 (hg38 assembly) showing SV10995 (insertion/deletion structural variant), the ten SNPs associated with disease traits from the GWAS Catalog, the squared correlation (g2) between each SNP and SV10995, and the p-value reported in the respective publication for each SNP. This information is displayed positionally in the context of PLEKHA1, the microRNA MIR3941, ENSG00000285955, ARMS2, and HTRA1.

Through the creation of this repository of disease-associated SNPs that are highly correlated with newly discovered SVs, this archive can serve as an important resource for fine mapping causal variants in medically important traits.

Data Availability

All data files including those detailing SV and GWAS SNP associations and results of GWAS SNP-SV pairs are available at https://github.com/hliang-SchrodiLab/SV_SNPs

https://github.com/hliang-SchrodiLab/SV_SNPs

References

  1. 1.↵
    Chiang C, Scott AJ, Davis JR, Tsang EK, et al. (2017) The impact of structural variation on human gene expression. Nat Genet 49:692–699.
    OpenUrlCrossRefPubMed
  2. 2.↵
    Scott AJ, Chiang C, Hall IM (2021) Structural variants are a major source of gene expression differences in humans and often affect multiple nearby genes. Genome Res 31:2249–2257.
    OpenUrlAbstract/FREE Full Text
  3. 3.↵
    Shanta O, Noor A, Human Genome Structural Variation Consortium (HGSVC), Sebat J (2020) The effects of common structural variants on 3D chromatin structure. BMC Genomics 21(1):95.
    OpenUrl
  4. 4.↵
    Halvorsen M, Huh R, Oskolkov N, Wen J, et al. (2020) Increased burden of ultra-rare structural variants localizing to boundaries of topologically associated domains in schizophrenia. Nat Comm 11:1842.
    OpenUrl
  5. 5.↵
    Chen L, Abel HJ, Das I, Larson DE, et al. (2021) Association of structural variation with cardiometabolic traits in Finns. Am J Hum Genet 108(4):583–596.
    OpenUrlCrossRef
  6. 6.↵
    Al Khleifat A, Iacoangeli A, van Vugt JJFA, Bowles H, et al. (2022) Structural variation analysis of 6,500 whole genome sequences in amyotrophic lateral sclerosis. NPJ Genom Med 7(1):8.
    OpenUrl
  7. 7.↵
    Beyter D, Ingimundardottir H, Oddsson A, Eggertsson HP, et al. (2021) Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits. Nat Genet 53:779–786.
    OpenUrlCrossRef
  8. 8.↵
    Talkowski ME, Rosenfeld JA, Blumenthal I, Pillalamarri V, et al. (2012) Sequencing chromosomal abnormalities reveals neurodevelopmental loci that confer risk across diagnostic boundaries. Cell 149:525–537.
    OpenUrlCrossRefPubMed
  9. 9.↵
    De Coster W and Van Broeckhoven C (2019) Newest methods for detecting structural variations. Trends Biotechnol 37(9):973–982.
    OpenUrl
  10. 10.↵
    Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, et al. (2021) Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372(6537):eabf7117.
    OpenUrlAbstract/FREE Full Text
  11. 11.↵
    https://www.ebi.ac.uk/gwas/home
  12. 12.↵
    ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC2www.internationalgenome.org/data-portal/data-collection/hgsvc2
  13. 13.↵
    Fritsche LG, Chen W, Schu M, Yaspan BL, et al. (2013) Seven new loci associated with age-related macular degeneration. Nat Genet 45(4):433–439.
    OpenUrlCrossRefPubMed
  14. 14.
    Guindo-Martinez M, Amela R, Bonas-Guarch S, Puiggros M, et al. (2021) The impact of non-additive genetic associations on age-related complex diseases. Nat Commun 12(1):2436.
    OpenUrlCrossRefPubMed
  15. 15.↵
    Chen W, Stambolian D, Edwards AO, Branham KE, et al. (2010) Genetic variants near TIMP3 and high-density lipoprotein-associated loci influence susceptibility to age-related macular degeneration. Proc Natl Acad Sci USA 107(16):7401–7406.
    OpenUrlAbstract/FREE Full Text
  16. 16.↵
    NG_011725.1:g.7643_8086delinsTTATTAATTAATTAACTAAAATTAAATTATTTAGTTAATTTAATTAAC TAAACT
  17. 17.↵
    Fritsche LG, Leonhardt T, Jassen A, Fisher SA, et al. (2008) Age-related macular degeneration is associated with an unstable ARMS2 (LOC387715) mRNA. Nat Genet 40:892–896.
    OpenUrlCrossRefPubMedWeb of Science
  18. 18.
    Dewan A, Liu M, Hartman S, Zhang SS, et al. (2006) HTRA1 promoter polymorphism in wet agerelated macular degeneration. Science 14(5801):989–992.
    OpenUrl
  19. 19.
    Fritsche LG, Igl W, Bailey JN, Sengupta S, et al. (2016) A large genome-wide association study of age-related macular degeneration highlights contributions of rare and common variants. Nat Genet 48(2):134–143.
    OpenUrlCrossRefPubMed
  20. 20.
    Winkler TW, Grassmann F, Brandl C, Kiel C, et al. (2020) Genome-wide association meta-analysis for early age-related macular degeneration highlights novel loci and insights for advanced disease. BMC Med Genomics 13(1):120.
    OpenUrlPubMed
  21. 21.
    Hysi PG, Choquet H, Khawaja AP, Wojciechowski R, et al. (2020) Meta-analysis of 542,934 subjects of European ancestry identifies new genes and mechanisms predisposing to refractive error and myopia. Nat Genet 52(4):401–407.
    OpenUrlCrossRefPubMed
  22. 22.
    Naj AC, Scott WK, Courtenay MD, Cade WH, et al. (2013) Genetic factors in nonsmokers with age-related macular degeneration revealed through genome-wide gene-environment interaction analysis. Ann Hum Genet 77(3):215–231.
    OpenUrlCrossRefPubMed
  23. 23.
    Han X, Ong JS, An J, Craig JE, et al. (2020) Association of Myopia and Intraocular Pressure With Retinal Detachment in European Descent Participants of the UK Biobank Cohort: A Mendelian Randomization Study. JAMA Ophthalmol 138(6):671–678.
    OpenUrl
  24. 24.
    Sobrin L, Ripke S, Yu Y, Fagerness J, et al. (2012) Heritability and genome-wide association study to assess genetic differences between advanced age-related macular degeneration subtypes. Ophthalmology 119(9):1874–1885.
    OpenUrlCrossRefPubMedWeb of Science
  25. 25.
    Mohammadnejad A, Nygaard M, Li S, Zhang D, et al. (2020) Generalized correlation coefficient for genome-wide association analysis of cognitive ability in twins. Aging (Albany NY) 12(22):22457–22494.
    OpenUrl
  26. 26.
    Sakaue S, Kanai M, Tanigawa Y, Karjalainen J, et al. (2021) A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet 53(10):1415–1424.
    OpenUrlCrossRefPubMed
Back to top
PreviousNext
Posted December 16, 2022.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Structural Variants in Linkage Disequilibrium with GWAS-Significant SNPs
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Structural Variants in Linkage Disequilibrium with GWAS-Significant SNPs
Hao Liang, Joni Sedillo, Steven J. Schrodi, Akihiro Ikeda
medRxiv 2022.12.14.22283482; doi: https://doi.org/10.1101/2022.12.14.22283482
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Structural Variants in Linkage Disequilibrium with GWAS-Significant SNPs
Hao Liang, Joni Sedillo, Steven J. Schrodi, Akihiro Ikeda
medRxiv 2022.12.14.22283482; doi: https://doi.org/10.1101/2022.12.14.22283482

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetic and Genomic Medicine
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)