Genome-wide contribution of common Short-Tandem Repeats to Parkinson’s Disease genetic risk ============================================================================================= * Bernabe I. Bustos * Kimberley Billingsley * Cornelis Blauwendraat * J. Raphael Gibbs * Ziv Gan-Or * Dimitri Krainc * Andrew B. Singleton * Steven J. Lubbe * For the International Parkinson’s Disease Genomics Consortium (IPDGC) ## ABSTRACT Parkinson’s disease (PD) is a complex neurodegenerative disorder with a strong genetic component, where most known disease-associated variants are single nucleotide polymorphisms (SNPs) and small insertions and deletions (Indels). DNA repetitive elements account for >50% of the human genome, however little is known of their contribution to PD etiology. While select short tandem repeats (STRs) within candidate genes have been studied in PD, their genome-wide contribution remains unknown. Here we present the first genome-wide association study (GWAS) of STRs in PD. Through a meta-analysis of 16 imputed GWAS cohorts from the International Parkinson’s Disease Genomic Consortium (IPDGC), totalling 39,087 individuals (16,642 PD cases and 22,445 controls of European ancestry) we identified 34 genome-wide significant STR loci (p < 5.34×10-6), with the strongest signal located in *KANSL1* (chr17:44205351:[T]11, p=3×10-39, OR=1.31 [CI 95%=1.26-1.36]). Conditional-joint analyses suggested that 4 significant STRs mapping nearby *NDUFAF2, TRIML2, MIRNA-129-1* and *NCOR1* were independent from known PD risk SNPs. Including STRs in heritability estimates increased the variance explained by SNPs alone. Gene expression analysis of STRs (eSTR) in RNASeq data from 13 brain regions, identified significant associations of STRs influencing the expression of multiple genes, including PD known genes. Further functional annotation of candidate STRs revealed that significant eSTRs within *NUDFAF2* and *ZSWIM7* overlap with regulatory features and are associated with change in the expression levels of nearby genes. Here we show that STRs at known and novel candidate PD loci contribute to PD risk, and have functional effects in disease-relevant tissues and pathways, supporting previously reported disease-associated genes and giving further evidence for their functional prioritization. These data represent a valuable resource for researchers currently dissecting PD risk loci. ## INTRODUCTION Parkinson’s disease (PD) is a complex neurodegenerative disease with an established genetic component. Studies over the years have identified several rare variants that cause or significantly increase the risk of disease in carriers, and genome-wide association studies (GWAS) have recently uncovered 90 common variants that influence PD risk1. It is estimated that common GWAS variants account for 16-36% of the overall genetic heritability of PD1,2 highlighting that a large proportion of the missing heritability remains to be identified. The vast majority of PD genetics studies have focused on the role of single nucleotide polymorphisms (SNPs), meaning that contributions of other genetic elements such as structural variants and repetitive elements have largely been ignored. Repetitive elements represent more than 55% of the human genome3. Short tandem repeat expansions (STRs), are small repetitive units ranging from 1 to 7 base pairs in length that vary among individuals, and account for ∼10% of all repetitive elements4. STRs are the cause of several neurological diseases and are associated to genes such as in Fragile X syndrome (*FMR-1*)5,6, Huntington’s disease (*HTT*)7, amyotrophic lateral sclerosis and frontotemporal dementia (*C9ORF72)**8,9*, and spinocerebellar ataxia (*SCA1*)10, and have also been linked to numerous complex neurological and psychiatric traits11. A role for STRs as drivers of GWAS signals have been identified12, where a risk SNP connected adjacent GGAA repeats by converting an interspaced GGAT motif into a GGAA motif, thereby increasing the number of consecutive GGAA motifs and modifying the activity of its sequence and functional impact. STRs have also been shown to significantly regulate gene expression and contribute to phenotypic plasticity13. STRs therefore represent a potential source of unexplored genetic variation that may account for some of the missing heritability of PD. In this regard, other repetitive elements, such as satellite repeats, have been shown to alter gene expression in blood of PD patients14. However, no genome-wide assessment of STRs in large population studies has yet been performed in this disease. Due to their more complex and highly repetitive structure compared to SNVs, STRs have been difficult to assess. Despite the recent explosion of genetic data stemming from next generation sequencing, STRs are still difficult to genotype. Recent advances in PCR-free deep sequencing methods and STR genotyping tools now allow for the simultaneous assessment of STRs genome-wide15. Studies have shown high linkage disequilibrium (LD) between STRs and SNPs across the genome16. Exploiting this high LD, Saini *et al*. (2018)17 generated a phased SNP-STR haplotype panel based on the 1000 Genomes Project samples that allows for the accurate genome-wide imputation of common STRs into array-based genotype data. To assess the role of common STRs in PD risk, we imputed and interrogated STRs across 16 independent PD case-control cohorts, totaling 39,087 individuals available through the International Parkinson’s disease Genomics Consortium (IPDGC). ## RESULTS ### Meta-analysis of IPDGC GWAS cohorts imputed with an STR reference panel The 16 GWAS cohorts used in this study, with a combined sample size of 39,087 individuals composed of 16,642 PD cases and 22,445 controls of self-reported European ancestry (**Supplementary Table 1**). After cohort-wise quality controls (see Methods), we performed genome-wide imputation using the 1000 Genomes STR-SNP reference panel17, and carried out case-control association analyses with PD status following a meta-analysis of fixed effects across all cohorts. After removing variants with high heterogeneity across meta-analyses (I2 >0.8), we obtained association p-values for 407,879 STRs, where 214 variants surpassed the threshold for genome-wide significance of 5.34×10-6 (**Figure 1, upper side)**, which was estimated by permutation procedures for the STR reference panel, as described elsewhere18. The inflation factor lambda for the association was 1.18 and the rescaled lambda for 1000 cases and controls (λ 1000) was 1.01. To characterize and identify independently associated STRs, we first performed a conditional-joint analysis using GCTA-COJO19 and identified 34 STR variants mapping to 32 unique nearby genes, with the strongest signal located in *KANSL1* (chr17:44205351:[T]11, p=3×10-39, OR=1.31 [CI 95%=1.26-1.36]), and followed by *SNCA* (chr4:90662073:TATTT[GT]8AT[GT]7, p=3.36×10-25, OR=1.36 [CI 95%=1.28-1.45]) (**Table 1**). Since STRs were imputed by leveraging LD information from SNPs, we carried out a secondary GCTA-COJO analysis including the meta-analys results from imputed, filtered SNPs (**Figure 1, lower side**), obtaining a total number of 8,179,378 SNPs and STRs in all cohorts. We found eight loci with associations led by STRs (**Supplementary Table 2**), and in order to refine these results, we further investigated their LD patterns with the 90 known PD risk variants from the 2019 PD GWAS meta-analysis1, and found that four of the eight STRs had LD r2 <0.5 with any of the known PD variants (**Supplementary Table 3**), indicating that these could be potential new PD risk signals: * a tetranucleotide repeat within the 3rd intron of *NDUFAF2* (risk allele chr5:60437492:AA[TGAA]7, p=6.49×10-8, OR=1.30, CI 95%=1.18-1.43) (**Figure 2A**); * a mononucleotide repeat downstream of *TRIML2* (risk allele chr4:189000404:TT[A]12, p=1.44×10-7, OR=1.31, CI 95%=1.19-1.44) (**Figure 2B**); * a mononucleotide repeat downstream of *MIR129-1* (risk allele chr7:127793488:[T]15G, p=2.79×10-7, OR=1.16, CI 95%=1.09-1.23) (**Figure 2C**); and * a mononucleotide repeat within the 44th intron of *NCOR1* (risk allele chr17:15941750:[T]11, p=3.77×10-6, OR=1.08, CI 95%=1.04-1.12) (**Figure 2D**). View this table: [Table 1.](http://medrxiv.org/content/early/2021/07/05/2021.07.01.21259645/T1) Table 1. **Genome-wide significant STR loci from meta-analysis of 16 case-control Parkinson’s Disease GWAS cohorts**. ![Figure 1.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2021/07/05/2021.07.01.21259645/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2021/07/05/2021.07.01.21259645/F1) Figure 1. Genome-wide association results for imputed STRs and SNPs in 16 PD GWAS cohorts from the International Parkinson’s disease genomics consortium. Hudson plot representing the association analysis results for STRs (upper) and SNPs (lower) across the human genome, showing the 34 genome-wide significant STR loci (p<5.34×10-6) after the conditional-joint analysis with GCTA. Genes in bold represent the loci influenced by STRs after including SNPs associations. Genes with a black dot at the top represent STR loci independent from the current 90 PD risk variants from the 2019 PD GWAS meta-analysis. ![Figure 2.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2021/07/05/2021.07.01.21259645/F2.medium.gif) [Figure 2.](http://medrxiv.org/content/early/2021/07/05/2021.07.01.21259645/F2) Figure 2. Regional association plots for the 4 candidate independent STR loci. Locus zoom plots were generated for the 4 GCTA-nominated independent STR loci from SNPs and 90 risk PD loci in **(A)** chromosome 5 within *NDUFAF2*, **(B)** chromosome 4 nearby *TRIML2*, **(C)** chromosome 7 nearby *MIR129-1* and **(D)** chromosome 17 within *NCOR1*. Lead STR variant is depicted as a purple diamond and nearby variants (STRs and SNPs) in circles colored by their LD r2 value to the lead STR variant. Gene annotations for each region are displayed in the bottom part of each panel, showing gene strand orientation with arrows. It is important to note here that the independent STR signal at *NDUFAF2* (chr5:60437492:AA[TGAA]7) is within a known PD risk locus (mapping to *ELOVL7*), and was previously identified through Mendelian randomization to be significantly associated with risk of PD1. Moreover, further LD analysis on this locus showed a high D’ statistic with the closest known PD risk SNP at that locus (D’=0.94 with rs1867598) indicating that, regardless of frequency disparities, the independency suggested by the GCTA-COJO analysis should be taken with caution. ### Quantifying the heritability of STRs in PD The genetic heritability of PD was recently estimated to be 22%1. Here, assuming a global disease prevalence of 0.2%2, we leveraged the GCTA-LDMS method19 and estimated that common STRs (MAF >1%) account for 15.2% (SE=0.01) of the additive heritability of the disease on the liability scale. Heritability for imputed SNPs in the same data accounted for 26.9% (SE=0.02), similarly to what was obtained in Keller *et al*., 2012 using GCTA as well. After including both common STRs and SNPs in the analysis, the heritability estimate increased to 28.8% (SE=0.02). This increase of 1.9% in the heritability estimate due to common STRs corresponds to a 7% increase from the SNP based estimate. ### eSTR analysis We functionally assessed the impact of the 34 significant STR associations through an expression quantitative trait loci analysis (eSTR). We investigated each locus extracting the leading STR and other STRs in high LD (r2 >0.5) within 1 Mb up- and downstream, obtaining 105 variants for further analysis (**Supplementary Table 4**). We used normalized gene expression data from frontal cortex from the North American Brain Expression Consortium (NABEC)20, and 13 brain tissues from the Genotype-Tissue Expression Consortium (GTEx v.8)21, and identified 10,252 STR-gene associations for both datasets and all tissues (**Figure 3A**). Of these, 840 associations showed a False discovery rate (FDR) corrected p<0.05, corresponding to 234 unique eGenes (genes with at least one significant variant), that included 19 of the 78 loci identified in the 2019 PD GWAS meta-analysis (genes nominated from the 90 PD risk variants): *RIT2, TMEM163, MCCC1, LCORL, CTSB, SETD1A, CRHR1, GPNMB, BIN3, TMEM175, GAK, MAP4K4, SNCA, SPTSSB, WNT3, KPNA1, ITGA8, BST1* and *HIP1R* (**Supplementary Table 5**). ![Figure 3.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2021/07/05/2021.07.01.21259645/F3.medium.gif) [Figure 3.](http://medrxiv.org/content/early/2021/07/05/2021.07.01.21259645/F3) Figure 3. eSTR analysis of top STR loci in gene expression data from brain tissues. **(A)** Quantile-quantile plot for eSTR analysis of all 34 top loci from meta-analysis and high LD STRs, totalling 105 variants, across brain tissues from the NABEC and GTEx datasets. Colors for each dot and lines were added to enhance resolution. **(B)** Box plots showing gene expression changes associated with STR variant chr5:60408714:TC[T]14GTATC, across the tissues where the variant was analyzed. **(C)** Box plots showing gene expression changes associated with STR variant chr17:15902070:[T]16GA. Locus representation is shown at the top of each box plot, representing the STR location and the distance (in kilobases) and orientation of the target gene. At the bottom, allele dosages, tissue, sample size and eSTR FDR-adjusted p-value is shown, with p-values in red representing significant associations. NA=Nucleus accumbens (basal ganglia); CT=Caudate (basal ganglia); CB=Cerebellum; CTX=Cortex; FC (NABEC)=Frontal cortex; FC=Frontal cortex (BA9); HT=Hypothalamus;CBH=Cerebellar hemisphere; HC=Hippocampus; PM=Putamen (basal ganglia); ACC=Anterior cingulate cortex (BA24); SC=Spinal cord (cervical c-1); AM= Amygdala; SN=Substantia nigra. To obtain functional and gene expression insights on the four candidate STRs signals obtained from our STR meta-analysis, we similarly extracted the four variants and their surrounding high LD STRs, obtaining 13 unique variants. We functionally annotated them, using regulatory features for gene expression from the Encyclopedia of DNA Elements (ENCODE)22, and found two STRs overlapping enhancers, transcription factor binding sites and histone marks for active transcription (*H3K4me3, H3K9Ac, H3K4me1, H3K27Ac, H3K36me3* and *H3K79me2*): one nearby *NDUFAF2* (chr5:60408714:TC[T]14GTATC) in high LD with the leading GWAS STR at that locus (chr5:60437492:AA[TGAA]7, r2=0.95); and one within *ZSWIM7* (chr17:15902070:[T]16GA), similarly, in high LD with the leading GWAS STR for that locus (chr17:15941750:[T]11, r2=0.84) (**Supplementary Table 6**). The PD risk allele for the eSTR in *NDUFAF2* (major allele with 13 T repetitions) is significantly associated with higher expression levels of the gene *PART1* (∼624 kb upstream) in the frontal cortex (**Figure 3B**). Interestingly, the significant eSTR in *ZSWIM7* showed associations in more brain tissues, where the risk allele in the STR meta-analysis (minor allele with 15 T repeats) was correlated with lower expression levels of *TRPV2* in the hypothalamus, anterior cingulate cortex, nucleus accumbens and frontal cortex; higher expression levels of *NCOR1* in the hippocampus; higher expression levels of *ADORA2B* in the anterior cingulate cortex; and lower expression levels of a long non-coding RNA gene located nearby *NCOR1* (CTC-529I10.1 or lnc-NCOR1-1) in the spinal cord and substantia nigra (**Figure 3C**). ### Gene-wise, gene-set and pathway enrichment analysis of PD associated STRs MAGMA23 gene-wise enrichment analysis of the STR meta-analysis results yielded 47 genes surpassing genome-wide significance (Bonferroni p<2.99×10-6, α=0.05/16,696); **Supplementary Table 7**). Of the 47 genes, 12 overlapped with the STR meta-analysis results, 8 overlapped with 78 PD loci nominated from the 90 PD risk variants (2019 PD GWAS meta-analysis), and 27 genes have not previously been identified as enriched genes. Gene-property analysis using gene expression data from GTEx v.8, as described in FUMA24, showed significant enrichment of genes in the pituitary and brain tissues after FDR correction (FDR p<0.05) for the 30 GTEx general tissues, (**Supplementary Figure 2A**) and in the cerebellum, cortex, pituitary, cerebellar hemisphere and frontal cortex for the 54 GTExp specific tissues (**Supplementary Figure 2B**). We further investigated gene connectivity via protein-protein interactions using a list of 445 genes surpassing nominal gene-wise STR enrichment (MAGMA p<0.01) with WebGestalt25, and leveraged the Network-Topology Analysis (NTA), finding 16 subnetworks (**Supplementary Figure 3**), which were significantly enriched in 27 gene ontology categories, such as synaptic vesicle cycle (GO:0099504), presynaptic endocytosis (GO:0140238) and autophagy (GO:0006914) (**Supplementary Table 8**). ## DISCUSSION In the present study we performed a genome-wide meta-analysis of STRs in 16 cohorts from the IPDGC. We have shown that associated STR signals overlap with known PD risk loci, and with candidate novel signals, that represented by STRs independent from current 90 risk variants1, and are located nearby *TRIML2, NDUFAF2, MIR129-1* and *NCOR1* (on chromosomes 4, 5, 7 and 17 respectively). We also assessed the functional consequences of the STRs at a gene expression level in brain tissues, which further supports their candidacy for functional studies to further understand the biological mechanisms behind their associations. The fact that 88% (30/34) of the associated STRs overlap with the current list of PD GWAS risk variants is not surprising as the STRs were imputed based on their existent LD with SNPs. Known PD loci with STR associations could potentially help to explain the current unknown molecular mechanisms underlying those regions, such as in *MAPT* and *SNCA*, where evidence has shown that repetitive elements play a major role in gene expression regulation, splicing, and hence protein structure26,27. This overlap is further reflected in the heritability estimates we obtained which indicated that the contribution of STRs to the genetic variance of PD is largely explained by their high LD with SNPs. However, STRs have shown to increase the contribution to overall SNP-only heritability estimates especifically on gene expression13,28, where STRs explained between 10%–15% of the *cis*-heritability, thereby supporting our observation that STRs contribute to the heratibility of PD. The eSTR colocalization analyses, using available RNAseq datasets, where we analyzed the top 34 STR signals and their surrounding high LD STRs, showed us different distributions of STR associations throughout the various brain regions, and at the gene level, we observed significant associations in 19 known PD risk genes, suggesting that these STRs are likely to be functionally relevant in these loci. Further investigation of the four independent nominated STRs managed to uncover likely functional mechanisms underlying the STR association in genes nearby *NDUFAF2* and *NCOR1*, due to the STR colocalization with regulatory features (epigenetic marks) involved in active transcription. The eSTR near *NDUFAF2* was found to significantly increase the expression *PART1* (**Figure 3B**). *PART1* is a long non-coding RNA that was found to be differentially expressed (downregulated) in a microarray-based analysis of 50 PD patients compared to 22 healthy controls29. The *ZSWIM7* eSTR was associated with significant effects on gene expression in different genes, such as *TRPV2*, a cation channel part of the Transient receptor potential family of proteins (TRPs) that are activated by physical and chemical stimuli30, and that are known to be involved in the regulation of ionic homeostasis, which is disrupted in PD31; *ADORA2B* is an adenosine receptor which has been associated with neurodegenerative conditions such as Huntington’s disease32, however no link to PD has been established so far; *lnc-NCOR1-1* and *NCOR1* (Nuclear Receptor Corepressor 1) are located within the same chromosomal region (short arm of chromosome 17) and were also influenced by the eSTRs. The former long non-coding gene has not been thoroughly characterized, therefore little is known about its function. The latter encodes a transcriptional inhibitor that has been found to regulate mitochondrial function33. Moreover, gene expression analyses showed that *NCOR1* is significantly upregulated in the substantia nigra of PD patients34. This evidence suggests that those genes associated with eSTRs in PD would be good candidates for follow-up analyses. The functional consequences of STRs captured by gene-wise and pathway analyses demonstrated that STRs are enriched in known PD-relevant pathways such as synaptic vesicle trafficking35 and autophagy36, and in tissues, such as the cortex, cerebellar hemisphere and frontal cortex. Also highlighted is the pituitary gland, that is known to express the dopaminergic receptors D2 and D437 and is part of the hypothalamic–pituitary–thyroid axis, where alterations in its balance has been shown to increase risk to PD38. This study marks the first (to our knowledge) PD STR GWAS to date and highlights the importance of incorporating other forms of genetic variation, such as STRs, into routine genetic analyses. Despite this, like with any study profiling repeat-based variants using short-read sequencing data, the analyses presented in this study have several limitations. First, focusing on the STR calls, STRs were imputed using a reference panel that was generated by the STR caller hipSTR using short-read whole-genome sequencing (WGS)17. There are two main drawbacks to this approach: (1) hipSTR cannot call STRs that are longer than the read length. Given that many of the known pathogenic STRs in neurological diseases are large repeat expansions, we currently lack the power to detect this important and potentially disease-associated class of tandem repeats; (2) As highlighted in the original study, imputation accuracy varies widely across STR loci, with highly variable multi-allelic STR only achieving ∼70% concordance. Hence future studies that validate the PD associated STRs with methods such as long-read sequencing will be crucial to confirm these loci and will be key to resolving complex repeat-based PD associated haplotypes. Second, although the majority of the STRs tested were biallelic, multi-allelic variants were split into biallelic for the GWAS and downstream analyses. This approach enabled us to perform commonly used GWAS methods for the different analysis presented in the study, but set aside the consideration of variant length as unit of analysis, an important aspect of repetitive elements, that need to be addressed in future developments with association tools that can incorporate these multi-allelic variants, which will likely give valuable insight into the specific role of repeat copy number to risk of disease we. Finally, it is important to highlight that, despite the fact that the STR panel used to impute our PD GWAS cohorts showed high levels of concordance (96.7%)17 with read-based callers such as hipSTR and TREDPARSE, the STRs reported in this study need further experimental validation, in order to discard any potential artifacts that could exist in both cases and controls, and to confirm their association with PD. Overall, we have performed the first STR GWAS meta-analysis in PD and reported that STRs contribute to its genetic risk. We have characterized another layer of genetic variation, helping us to gain statistical power to nominate novel candidate PD risk variants and genes, and to provide a more complete reference of the genetic variation that contributes to the disease. Hence this data is a valuable resource for researchers currently dissecting the known PD risk loci. Moving forward, a large-scale GWAS which utilizes calls directly from WGS data and validates hits using long-read sequencing methodologies is essential for fully understanding the contribution of STRs to the genetics of PD. ## METHODS A summary diagram for the methodological steps followed in the present study is shown in **Supplementary Figure 1**. ### Samples and quality control All genotyping data was obtained from previously generated IPDGC datasets, consisting of 39,087 individuals (16,642 cases and 22,445 controls) of European ancestry1. All individuals provided informed consent for participation in genetics studies, which was approved by the relevant local ethics committee for each of the datasets used. Detailed demographic, sample sizes and PD status are given in **Supplementary Table 1**. Further information along with detailed quality control (QC) methods have been previously published1,39. Briefly, for sample QC prior to imputation, individuals with low call rate, discordance between genetic and reported sex, heterozygosity outliers and ancestry outliers were removed. For genotype QC, variants with a missingness rate of > 5%, minor allele frequency (MAF) < 0.01, exhibiting deviations from Hardy–Weinberg Equilibrium (HWE) <1×10-5 and palindromic SNPs were excluded. ### STR imputation and filtering STR genotypes were imputed into the IPDGC SNP unimputed genotyping datasets using Beagle v.5.140 with the 1000 Genomes SNP-STR Haplotype reference panel17. In brief, STR genotypes in the reference panel were imputed from STRs called from the catalog-based STR caller hipSTR 15 and supplemented using a second STR caller, TREDPARSE41. STRs were phased with corresponding SNPs creating a final panel in the 1000 genomes project data that contained 27,185,239 SNP and 445,725 STR markers. Once STRs were imputed into all IPDGC SNP genotype datasets, the STR calls were filtered to facilitate downstream association analysis. First STRs were split from multi-allelic variants to single biallelic variants using the vt variant tool42. Finally SNPs and STRs with a dosage R-squared (DR2) <0.3 were removed to filter out low quality imputed variants. ### Study-level STR analysis and meta-analysis To estimate PD risk, imputed dosages (*i*.*e*. genotype probabilities for a variant to be A/A, A/B, or B/B from 0 to 2) were analyzed using a logistic regression model adjusted for sex, age at onset (AAO) for cases or examination for controls, and the first 10 principal componenets (PCs). To note, AAO could not be included as a covariate for the Myers-Faroud43 and Vance (dbGap phs000394) studies as no AAO information was available. Summary statistics were generated using the RVTESTS package44 and filtered for a MAF >1%. Meta-analysis was conducted based on the fixed-effect model as implemented in METAL45 by combining summary statistics across all 16 IPDGC datasets. All variants with a meta-analysis heterogeneity value of less than 80% (I2 <0.80) were kept for further analysis. ### Conditional-joint and linkage disequilibrium analyses To select candidate variants, we used the Genome-wide Complex Trait Analysis software (GCTA)19 to perform conditional and joint analysis (COJO) STRs, from the meta-analysis summary statistics. In order to differentiate associations between STRs and SNPs, we performed two COJO analyses, first with STRs only and second, with STRs and SNPs together. As an LD reference for GCTA we used a sample subset of merged imputed genotypes (hard call threshold of 0.8) from the IPDGC GWAS cohorts46, totaling 4,397 PD cases and 9,137 controls. Additionally, we performed LD calculations between top STRs with the previously reported list of 90 PD variants1 using PLINK v.1.947 to determine highly linked STRs to known PD risk variants. Hudson plot showing the genome-wide association results for STRs and SNPs separately was done with the *hudson* R package ([https://github.com/anastasia-lucas/hudson](https://github.com/anastasia-lucas/hudson)). Regional plots for the GCTA-nominated independent STRs were done with Locuszoom standalone version48. ### eSTR analysis Using sample level genotypes and gene expression data from the North American Brain Expression Consortium (NABEC)20 and the Genotype-Tissue Expression project21, we carried out an eQTL analysis with imputed STRs (eSTR). The NABEC data was composed of 343 individuals with genotypes obtained from high-coverage Illumina WGS. Corresponding gene expression data was generated from frontal cortex tissue by RNASeq and normalized gene counts were used. The GTEx v.8 data (dbGaP: phs000424.v7.p2) comprises high-coverage (30X) Illumina WGS data from 838 unrelated samples. We downloaded the fully processed, filtered and normalized gene expression matrices (in BED format) for each of the 13 brain tissues including: amygdala, Anterior cingulate cortex (BA24), caudate (basal ganglia), cerebellar hemisphere, cerebellum, cortex, frontal cortex (BA9), hippocampus, hypothalamus, nucleus accumbens (basal ganglia), putamen (basal ganglia), spinal cord cervical (c-1) and substantia nigra ([https://gtexportal.org/home/datasets](https://gtexportal.org/home/datasets)). WGS genotypes from GTEx and gene start-end coordinates for expression data for GTEx and NABEC were converted from hg38 reference to hg19 using UCSC liftover tool49. STRs were imputed as described above. eSTR analysis was performed using the FastQTL software50 correcting for PCs 1-10, sample age, sex (if available) and probabilistic estimations of expression residuals factors (PEER) generated using the PEER software51: 45 factors for NABEC and 15 for GTEx (as indicated in the GTEx documentation). The 34 top STRs from the meta-analysis along with variants with LD >0.5 (105 STRs total) were used to conduct the eSTR analysis. QQ-plots and box plots were done using ggplot2 R package52. ### Heritability estimation We used the GCTA-LDMS method19,53 to estimate the heritability of STRs only, both SNPs and STRs together and SNPs only. The method corrects for LD bias in the estimated variant-based heritability from WGS or imputed data. Heritability estimates and their corresponding standard errors are shown in the liability scale. ### Gene-set, network and pathway enrichment analyses of significant STR loci To functionally characterize the top associated STRs, we carried out loci connectivity analyses across gene-ontologies and gene-expression datasets using FUMA24 and protein-protein interaction networks using Webgestalt25. We ran MAGMA gene-wise analysis23 using the meta-analysis summary statistics for all STRs, and used the 1000 Genomes SNP-STR dataset as out reference panel17. We selected 445 genes with a gene-wise p<0.01 for further analyses (**Supplementary Table 7**). Gene lists were analyzed for functional enrichments using (i) FUMA gene2func tool, (ii) Biogrid PPI Network Topology-based Analysis (NTA) in Webgestalt and (iii) gene property analysis for tissue specificity, using 23,675 genes from Genotype-Tissue Expression (GTEx) RNASeq data21 across the 30 general and 54 specific tissues. Data preprocessing and gene expression normalization methods are presented in the FUMA tutorial section ([https://fuma.ctglab.nl/tutorial](https://fuma.ctglab.nl/tutorial)). Bonferroni and Benjamini-Hochberg FDR corrections for multiple testing were performed for MAGMA gene-wise results and functional enrichment analyses, respectively. ## Supporting information Supplementary Table [[supplements/259645_file02.xlsx]](pending:yes) Supplementary Figure [[supplements/259645_file03.pdf]](pending:yes) ## Data Availability Full STR GWAS summary statistics for the 16 datasets meta-analysed are available at the following link. [https://drive.google.com/file/d/1kD1i6tHdYC5w0xvxWLD4B-bSPqpnwzNV/view?usp=sharing](https://drive.google.com/file/d/1kD1i6tHdYC5w0xvxWLD4B-bSPqpnwzNV/view?usp=sharing) ## Data Availability Full STR GWAS summary statistics for the 16 datasets meta-analysed are available at [https://drive.google.com/file/d/1kD1i6tHdYC5w0xvxWLD4B-bSPqpnwzNV/view?usp=sharing](https://drive.google.com/file/d/1kD1i6tHdYC5w0xvxWLD4B-bSPqpnwzNV/view?usp=sharing) ## Code Availability The STR imputation, study level GWAS and meta-analysis: [https://github.com/neurogenetics/PD\_STR\_imputation](https://github.com/neurogenetics/PD\_STR\_imputation). Downstream analyses: [https://github.com/bibb/STR\_GWAS\_downstream\_analysis](https://github.com/bibb/STR_GWAS_downstream_analysis) ## Conflicts of Interest D.K. is the Founder and Scientific Advisory Board Chair of Lysosomal Therapeutics Inc. and Vanqua Bio. D.K. serves on the scientific advisory boards of The Silverstein Foundation, Intellia Therapeutics, AcureX and Prevail Therapeutics and is a Venture Partner at OrbiMed. Z.GO. has received consulting fees from Lysosomal Therapeutics Inc., Idorsia, Prevail Therapeutics, Denali, Ono Therapeutics, Neuron23, Handl Therapeutics, Bial Biotech Inc., Deerfield, Lighthouse and Inception Sciences (now Ventus). B.I.B., K.B., C.B., J.R.G., A.B.S. and S.J.L. declare that they have no competing interests. ## Acknowledgements We would like to thank all of the subjects who donated their time and biological samples to be a part of this study. We also would like to thank all members of the International Parkinson’s Disease Genomics Consortium (IPDGC). For a complete overview of members, acknowledgements and funding, please see [http://pdgenetics.org/partners](http://pdgenetics.org/partners). This work was supported in part by the Intramural Research Programs of the National Institute of Neurological Disorders and Stroke (NINDS), the National Institute on Aging (NIA), and the National Institute of Environmental Health Sciences both part of the National Institutes of Health, Department of Health and Human Services; project numbers 1ZIA-NS003154, Z01-AG000949-02 and Z01-ES101986. In addition, this work was supported by the Department of Defense (award W81XWH-09-2-0128), and The Michael J Fox Foundation for Parkinson’s Research. This work utilized the computational resources of the NIH HPC Biowulf cluster ([http://hpc.nih.gov](http://hpc.nih.gov)). The access to part of the participants for this research has been made possible thanks to the Quebec Parkinson’s Network ([http://rpq-qpn.ca/en/](http://rpq-qpn.ca/en/)). This work was supported by the Simpson Querrey Center for Neurogenetics (to D.K.) * Received July 1, 2021. * Revision received July 1, 2021. * Accepted July 5, 2021. * © 2021, Posted by Cold Spring Harbor Laboratory This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license ## REFERENCES 1. 1.Nalls, M. A. et al. Identification of novel risk loci, causal insights, and heritable risk for Parkinson’s disease: a meta-analysis of genome-wide association studies. Lancet Neurol. 18, 1091–1102 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S1474-4422(19)30320-5&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) 2. 2.Keller, M. F. et al. Using genome-wide complex trait analysis to quantify ‘missing heritability’ in Parkinson’s disease. Hum. Mol. Genet. 21, 4996–5009 (2012). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/hmg/dds335&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22892372&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000310369000017&link_type=ISI) 3. 3.Consortium, I. H. G. S. & International Human Genome Sequencing Consortium. Correction: Initial sequencing and analysis of the human genome. Nature vol. 412 565–566 (2001). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/35087627&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=WOS:00017020&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000170202900052&link_type=ISI) 4. 4.Criscione, S. W., Zhang, Y., Thompson, W., Sedivy, J. M. & Neretti, N. Transcriptional landscape of repetitive elements in normal and cancer human cells. BMC Genomics 15, 583 (2014). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1471-2164-15-583&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25012247&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) 5. 5.Fu, Y. H. et al. Variation of the CGG repeat at the fragile X site results in genetic instability: resolution of the Sherman paradox. Cell 67, 1047–1058 (1991). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/0092-8674(91)90283-5&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=1760838&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1991GX16400005&link_type=ISI) 6. 6.Verkerk, A. J. et al. Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome. Cell 65, 905–914 (1991). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/0092-8674(91)90397-H&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=1710175&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1991FP51600020&link_type=ISI) 7. 7.MacDonald, M. E. et al. A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes. Cell 72, 971–983 (1993). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/0092-8674(93)90585-E&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=8458085&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1993KU17500017&link_type=ISI) 8. 8.Renton, A. E. et al. A hexanucleotide repeat expansion in C9ORF72 is the cause of chromosome 9p21-linked ALS-FTD. Neuron 72, 257–268 (2011). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.neuron.2011.09.010&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21944779&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000296224000009&link_type=ISI) 9. 9.DeJesus-Hernandez, M. et al. Expanded GGGGCC hexanucleotide repeat in noncoding region of C9ORF72 causes chromosome 9p-linked FTD and ALS. Neuron 72, 245–256 (2011). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.neuron.2011.09.011&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21944778&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000296224000008&link_type=ISI) 10. 10.Orr, H. T. et al. Expansion of an unstable trinucleotide CAG repeat in spinocerebellar ataxia type 1. Nat. Genet. 4, 221–226 (1993). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng0793-221&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=8358429&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1993LJ84200007&link_type=ISI) 11. 11.Hannan, A. J. Tandem repeats mediating genetic plasticity in health and disease. Nat. Rev. Genet. 19, 286–298 (2018). 12. 12.Grünewald, T. G. P. et al. Chimeric EWSR1-FLI1 regulates the Ewing sarcoma susceptibility gene EGR2 via a GGAA microsatellite. Nat. Genet. 47, 1073–1078 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3363&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26214589&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) 13. 13.Gymrek, M. et al. Abundant contribution of short tandem repeats to gene expression variation in humans. Nat. Genet. 48, 22–29 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3461&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26642241&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) 14. 14.Billingsley, K. J. et al. Analysis of repetitive element expression in the blood and skin of patients with Parkinson’s disease identifies differential expression of satellite elements. Scientific Reports vol. 9 (2019). 15. 15.Willems, T. et al. Genome-wide profiling of heritable and de novo STR variations. Nat. Methods 14, 590–592 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nmeth.4267&link_type=DOI) 16. 16.Payseur, B. A., Place, M. & Weber, J. L. Linkage disequilibrium between STRPs and SNPs across the human genome. Am. J. Hum. Genet. 82, 1039–1050 (2008). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2008.02.018&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18423524&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000255923600004&link_type=ISI) 17. 17.Saini, S., Mitra, I., Mousavi, N., Fotsing, S. F. & Gymrek, M. A reference haplotype panel for genome-wide imputation of short tandem repeats. Nat. Commun. 9, 4397 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-018-06694-0&link_type=DOI) 18. 18. S. Saini, M. Gymrek, PGC Schizophrenia Working Group. Studying the role of short tandem repeat variants in schizophrenia risk. (2019). 19. 19.Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2010.11.011&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21167468&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) 20. 20.Gibbs, J. R. et al. Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. PLoS Genet. 6, e1000952 (2010). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pgen.1000952&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20485568&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) 21. 21.GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020). [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEzOiIzNjkvNjUwOS8xMzE4IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjEvMDcvMDUvMjAyMS4wNy4wMS4yMTI1OTY0NS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 22. 22.ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nature11247&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22955616&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000308347000039&link_type=ISI) 23. 23.de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pcbi.1004219&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25885710&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) 24. 24.Watanabe, K., Taskesen, E., van Bochoven, A. & Posthuma, D. Functional mapping and annotation of genetic associations with FUMA. Nat. Commun. 8, 1826 (2017). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41467-017-01261-5&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) 25. 25.Liao, Y., Wang, J., Jaehnig, E. J., Shi, Z. & Zhang, B. WebGestalt 2019: gene set analysis toolkit with revamped UIs and APIs. Nucleic Acids Res. 47, W199–W205 (2019). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nar/gkz401&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) 26. 26.Caillet-Boudin, M.-L., Buée, L., Sergeant, N. & Lefebvre, B. Regulation of human MAPT gene expression. Mol. Neurodegener. 10, 28 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13024-015-0025-8&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26170022&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) 27. 27.Afek, A. et al. Toward deciphering the mechanistic role of variations in the Rep1 repeat site in the transcription regulation of SNCA gene. Neurogenetics 19, 135–144 (2018). 28. 28.Quilez, J. et al. Polymorphic tandem repeats within gene promoters act as modifiers of gene expression and DNA methylation in humans. Nucleic Acids Research vol. 44 3750–3762 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nar/gkw219&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27060133&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) 29. 29.Chi, L.-M., Wang, L.-P. & Jiao, D. Identification of Differentially Expressed Genes and Long Noncoding RNAs Associated with Parkinson’s Disease. Parkinson’s Disease vol. 2019 1–7 (2019). 30. 30.Duitama, M. et al. TRP Channels Role in Pain Associated With Neurodegenerative Diseases. Front. Neurosci. 14, 782 (2020). 31. 31.Vaidya, B. & Sharma, S. S. Transient Receptor Potential Channels as an Emerging Target for the Treatment of Parkinson’s Disease: An Insight Into Role of Pharmacological Interventions. Front Cell Dev Biol 8, 584513 (2020). 32. 32.Liu, J. et al. Genetics Modulate Gray Matter Variation Beyond Disease Burden in Prodromal Huntington’s Disease. Frontiers in Neurology vol. 9 (2018). 33. 33.Fan, W. & Evans, R. PPARs and ERRs: molecular mediators of mitochondrial metabolism. Curr. Opin. Cell Biol. 33, 49–54 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ceb.2014.11.002&link_type=DOI) 34. 34.Cowherd, M. & Lee, I. Transcriptional regulators are upregulated in the substantia nigra of Parkinson’s disease patients. J. Emerg. Invest., 1–7. (2015). 35. 35.Esposito, G., Clara, F. A. & Verstreken, P. Synaptic vesicle trafficking and Parkinson’s disease. Developmental Neurobiology vol. 72 134–144 (2012). 36. 36.Lim, G. G. Y. Role of Autophagy in Parkinson’s Disease. (IntechOpen, 2013). 37. 37.Jaber, M., Robinson, S. W., Missale, C. & Caron, M. G. Dopamine receptors and brain function. Neuropharmacology 35, 1503–1519 (1996). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0028-3908(96)00100-1&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=9025098&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1996WC81400001&link_type=ISI) 38. 38.Mohammadi, S., Dolatshahi, M. & Rahmani, F. Shedding light on thyroid hormone disorders and Parkinson disease pathology: mechanisms and risk factors. J. Endocrinol. Invest. 44, 1–13 (2021). 39. 39.Blauwendraat, C. et al. Parkinson disease age of onset GWAS: defining heritability, genetic loci and a-synuclein mechanisms. (2019) doi:[http://paperpile.com/b/1ivWNe/akWLG"](http://paperpile.com/b/1ivWNe/akWLG). 40. 40.Browning, B. L., Zhou, Y. & Browning, S. R. A One-Penny Imputed Genome from Next-Generation Reference Panels. Am. J. Hum. Genet. 103, 338–348 (2018). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.ajhg.2018.07.015&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30100085&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) 41. 41.Tang, H. et al. Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes. Am. J. Hum. Genet. 101, (2017). 42. 42.Tan, A., Abecasis, G. R. & Kang, H. M. Unified representation of genetic variants. Bioinformatics 31, 2202–2204 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btv112&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25701572&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) 43. 43.Pankratz, N. et al. Meta-analysis of Parkinson’s disease: identification of a novel locus, RIT2. Ann. Neurol. 71, 370–384 (2012). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/ana.22687&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22451204&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) 44. 44.Zhan, X., Hu, Y., Li, B., Abecasis, G. R. & Liu, D. J. RVTESTS: an efficient and comprehensive tool for rare variant association analysis using sequence data. Bioinformatics 32, 1423–1426 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btw079&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=27153000&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) 45. 45.Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics vol. 26 2190–2191 (2010). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btq340&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20616382&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000281738900017&link_type=ISI) 46. 46.Simon-Sanchez, J. et al. Genome-wide association study reveals genetic risk underlying Parkinson’s disease. Nat. Genet. 41, 1308 (2009). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.487&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=19915575&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000272144900012&link_type=ISI) 47. 47.Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s13742-015-0047-8&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25722852&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) 48. 48.Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btq419&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20634204&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000281714100054&link_type=ISI) 49. 49.Hinrichs, A. S. et al. The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 34, D590–8 (2006). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nar/gkj144&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16381938&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000239307700126&link_type=ISI) 50. 50.Ongen, H., Buil, A., Brown, A. A., Dermitzakis, E. T. & Delaneau, O. Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics 32, 1479–1485 (2016). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btv722&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26708335&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom) 51. 51.Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat.Protoc. 7, 500–507 (2012). 52. 52.Wickham, H. GGPLOT2: Elegant Graphics for Data Analysis 2016 Springer-Verlag, New York. (2016). 53. 53.Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015). [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/ng.3390&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26323059&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2021%2F07%2F05%2F2021.07.01.21259645.atom)