Complex trait associations in rare diseases and impacts on Mendelian variant interpretation

Craig Smail; Bing Ge; Marissa R. Keever-Keigher; Carl Schwendinger-Schreck; Warren Cheung; Jeffrey J. Johnston; Cassandra Barrett; Genomic Answers for Kids Consortium; Keith Feldman; Ana S.A. Cohen; Emily G. Farrow; Isabelle Thiffault; Elin Grundberg; Tomi Pastinen

doi:10.1101/2024.01.10.24301111

Abstract

Emerging evidence implicates common genetic variation – aggregated into polygenic scores (PGS) – impacting the onset and phenotypic presentation of rare diseases. In this study, we quantified individual polygenic liability for 1,151 previously published PGS in a cohort of 2,374 probands enrolled in the Genomic Answers for Kids (GA4K) rare disease study, revealing widespread associations between rare disease phenotypes and PGSs for common complex diseases and traits, blood protein levels, and brain and other organ morphological measurements. We observed increased polygenic burden in probands with variants of unknown significance (VUS) compared to unaffected carrier parents. We further observed an enrichment in overlap between diagnostic and candidate rare disease genes and large-effect PGS genes. Overall, our study supports and expands on previous findings of complex trait associations in rare disease phenotypes and provides a framework for identifying novel candidate rare disease genes and in understanding variable penetrance of candidate Mendelian disease variants.

Introduction

The Genomic Answers for Kids (GA4K) study at Children’s Mercy Research Institute is a large-scale, phenotypically diverse pediatric rare disease cohort comprising patient cases referred from 22 different hospital specialties with almost 14,000 individuals enrolled from 6,000 families currently^1,2. Comprehensive clinical genome assessment utilizing structured phenotypes and prioritized variants from whole-exome or whole-genome sequencing are further combined with additional omic approaches with the goal of improving understanding of the genetics of rare diseases.

One such strategy is the integration of polygenic scores (PGS) to assess the contribution of common genetic variants to rare disease phenotypes. PGS approaches have been previously applied to estimate disease risk attributed to inherited common variant polygenic background – such as in severe neurodevelopmental disorders³; and in integrating trait-matched PGS to understand differences in disease penetrance among carriers of monogenic risk variants^4–7. Further, differences in individual PGS liability can help resolve variable expressivity of complex, multi-phenotype rare disorders, such as risk for schizophrenia among carriers of the 22q11.2 deletion⁸.

PGS effect estimates have also highlighted a sharing of underlying causal genes in monogenic and matched common diseases and other correlated complex traits^9,10, providing opportunities for expanded discovery of large-effect variants underlying rare disease patient cases. Integrating common variant PGSs derived from population-scale resources available from PGS Catalog¹¹, here we systematically mapped the impacts of >1000 PGSs for common complex diseases, laboratory tests, organ morphological, and anthropometric traits in >500 groupings of probands defined by their rare disease phenotypes.

Results

To map associations between rare disease phenotypes and common complex diseases, traits, and measurements we first generated a filtered set of PGS obtained from PGS Catalog (Methods; N PGS = 1,151). We calculated individual scores for each PGS in a subset of 2,374 probands enrolled in GA4K filtered for European ancestry to reflect the demographic background of the majority of individuals currently comprising PGS training cohorts (Methods; Table 1; Figure S1). To calculate individual PGS scores in GA4K, genotyping array data was first combined with imputation (Methods). Correlation of PGS variants also available in the GA4K imputed genotype callset was high (r = 0.99; P < 1×10⁻¹⁶; Figure S2).

View this table:

Table 1. Summary of GA4K probands in study cohort

We quantified the contribution of each of the 1,151 PGS in 574 rare disease phenotype (HPO) case/controls cohorts (N pairwise comparisons = 660,674; HPO median case N = 10). For each of these cohorts, we constructed a logistic regression model comprising PGS, sex, and first five principal components of ancestry, and further compared observed results with null distributions from permutation testing (N permutations = 10,000), yielding an empirical P-value for each HPO-PGS pair (Methods). From this approach we identified 824 significant HPO-PGS pairs (FDR 20%) comprising 519 PGS and 138 HPO cohorts (Table S1).

Categorizing HPO and PGS into tissue- and/or physiology-specific measurements and disorders, we observed the greatest number of associations for nervous system disorder HPOs with brain volume PGSs (N = 62 HPO-PGS pairs) (for example, “severe global developmental delay” (HP:0011344) and “mean isotropic volume fraction in right superior corona radiata” PGS (PGS001506, P = 5×10⁻⁰⁵), as well as growth disorder HPOs associated with body measurement PGSs (N = 61 HPO-PGS pairs) (for example, “obesity” (HP:0001513) and “body mass index” PGS (PGS000027), P = 6×10⁻¹¹) (Figure 1). We further observed blood protein PGSs associated broadly with multiple HPO categories (35 unique blood protein PGS across 13 of the 20 unique HPO categories). Overall, we observed at least one significant (FDR 20%) PGS association for 1,972 of 2,374 probands in the study cohort (83%). Median regression model fit (Nagelkerke’s pseudo-R²) was 0.78% (range = 0.4 - 4.1%) (Figure S3). At more stringent FDR thresholds, we observed 331 (FDR 10%) and 182 (FDR 5%) significant HPO-PGS pairs. Summary statistics for all tested HPO-PGS pairs are provided in Table S2.

Figure 1.

Overview of significant associations across categories of HPOs and PGSs. HPOs (y-axis) were collapsed in to corresponding affected system/anatomy. PGS terms (x-axis) were categorized using ontology metadata available from PGS Catalog and further colored by measurement (red), disease (green), other trait (blue), biological process (cyan), or sex-specific PGS (orange). Color gradient indicates number of significant associations within each HPO/PGS category pair. Gray indicates no significant associations for indicated HPO-PGS pair.

We next assessed the impact of PGS on a proband’s clinical diagnostic status. We used a logistic regression model to quantify the contribution of an increasing burden of rare disease phenotypes (HPOs) linked to PGS to the likelihood of having a clinical diagnostic status including “diagnostic”, “VUS/GUS”, “negative” classification. We repeated this analysis for increasingly stringent proband PGS Z-score thresholds. For each additional rare disease phenotype (HPO) significantly associated with a PGS (FDR 20%), we observed a modest decrease in the likelihood of “diagnostic” status (odds ratio = 0.98 [CI 0.96-0.99], P = 0.03) and an increase in the likelihood of “VUS/GUS” status (odds ratio = 1.02 [CI 1.01 – 1.04], P = 0.03) (Figure 2). For the most stringent PGS Z-score threshold, this effect was stronger (“diagnostic”: odds ratio = 0.87 [CI 0.81-0.94], P = 2×10⁻⁰⁴; “VUS/GUS”: odds ratio = 1.11 [CI 1.03-1.18], P = 3×10⁻⁰³). No differences were observed for “negative” cases across any PGS Z-score threshold.

We next asked whether observed results could be explained by potential challenges with clinical diagnosis of more complex patient cases (that is, a relatively larger number of rare disease phenotypes irrespective of any PGS associations). Again using a logistic regression model, we quantified the likelihood of each clinical diagnostic status associated with a count of all HPOs per proband and observed no significant associations.

Figure 2.

Change in likelihood of indicated clinical diagnostic status as a function of PGS-associated HPO phenotype burden. Results are displayed across more stringent PGS Z-score thresholds (from bottom to top of each diagnostic category). P-values passing a threshold of P <= 0.05 are indicated in figure.

Given previous reports showing modification of monogenic disease penetrance and severity associated with trait-relevant polygenic liability^4,6, we hypothesized that significantly associated PGS in probands with a candidate – but presently non-diagnostic – inherited variant in a known rare disease gene (variant of unknown significance (VUS)) would exhibit increased polygenic liability compared with the PGS of their unaffected carrier parent (Figure 3A). In a subset of probands with an inherited, autosomal dominant (with partial penetrance) VUS and with significantly associated PGS available for the full trio (Methods), we calculated the difference in standard deviation (Z-score) in proband and parent PGS whereby values greater than zero indicates greater liability in probands, observing a median difference between probands and carrier parents of 0.52 SD (P = 3×10^-09, Wilcoxon test against PGS permuted background), which increased at more stringent HPO-PGS significance (FDR) thresholds (FDR 10%: 0.79 SD, P = 7×10^-05; FDR 5%: 0.93 SD, P = 4×10^-08) (Figure 3B). We repeated this analysis for unaffected carrier sibling(s) where available, observing similar increased PGS liability for probands (0.65 SD, P = 6×10^-04; Figure S4A). We observed smaller median differences in PGS when comparing probands with non-carrier parents (FDR 20%: 0.30 SD, P = 1×10^-06; FDR 10%: 0.02 SD, P = NS; FDR 5%: -0.07 SD, P = NS) (Figure 3B).

Figure 3.

A. Proposed model of variable rare disease penetrance for inherited candidate pathogenic variant (VUS) in proband and unaffected carrier parent as a function of polygenic liability for an associated complex trait PGS. B. Proband – parent PGS Z-score/standard deviation for significantly associated PGS in probands with a clinical variant of unknown significance (VUS) compared to unaffected carrier (Het) parent (orange) and non-carrier (Ref) parent (blue). Results are stratified across HPO-PGS significance (FDR) thresholds (from right to left: FDR 20%, 10%, 5%). Points above zero indicate PGS liability is greater in the proband, whereas points below zero indicate PGS liability is greater in the indicated Het or Ref parent. Pairwise differences for PGS with direction of effect < 0 (e.g. short stature HPO and Height PGS) were inverted to enable visualization in the same figure of PGS with direction of effect > 0. C. Summarized within-family comparison of PGS enrichment for VUS probands and parents. Pairwise comparisons are shown for proband (red) and Het parent (orange), and proband and Ref parent (blue). Results are stratified across HPO-PGS significance (FDR) thresholds (from right to left: FDR 20%, 10%, 5%).

We next calculated within-trio comparisons across all matched PGS to quantify the proportion of observations where the proband exhibits enriched PGS liability compared with their carrier and non-carrier parent (that is, PGS is greater/less than the indicated parent with respect to associated PGS direction of effect), repeated across HPO-PGS significance (FDR) thresholds. We observed a greater proportion of increased PGS liability for probands compared with their carrier parent (PGS enriched in proband: 69.3% (FDR 20%), 74.4% (FDR 10%), 100% (FDR 5%), but did not observe the same trend when comparing with noncarrier parent (PGS enriched in proband: 68.6% (FDR 20%), 53.8% (FDR 10%), 52.6% (FDR 5%)) (Figure 3C). We further observed a greater proportion of increased PGS for probands compared with unaffected carrier sibling(s) (Figure S4B).

Next, we obtained clinical observation data for height from the electronic medical record (EHR) at the GA4K study site to compare individual PGS with growth trajectories recorded at clinical visits. We observed moderate positive correlation comparing CDC age- and sex-adjusted height Z-scores with a significantly associated PGS for height (PGS ID = “PGS000998”; Methods) (r = 0.36, P < 1×10⁻¹⁶) (Figure 4). We then integrated rare disease phenotypes (HPO terms) to annotate “short stature” (HP:0004322), “tall stature” (HP:0000098), and control cohorts. For case probands, we observed a partial overlap in height growth chart values exceeding the CDC-recommended outlier threshold for short or tall stature (height Z-score <=2 or >=2, respectively). For the short stature cohort (N = 207), 95 (46%) passed the threshold, and for tall stature cohort (N = 32), 19 (60%) passed the corresponding threshold.

For probands with observed outlier height Z-scores we observed that their trait-relevant PGS liability tended to stratify into expected direction of effect. Specifically, for the short stature cohort with outlier height Z-score (N = 95), 74 (78%) had a PGS <=0 and 21 (22%) > 0. For tall stature (N = 19), 18 (95%) had a PGS >= 0 and 1 (5%) < 0. This finding highlights the potential for enhanced rare disease phenotyping through integrating EHR and PGS information.

Figure 4.

Integrating PGS and clinical growth chart data. Individual height observations indicate age and sex-adjusted Z-scores (CDC benchmarks) and are displayed on the y-axis. Individual scores for a significantly associated height PGS are displayed on the x-axis. Red horizontal line indicates short stature threshold (CDC). Blue horizontal line indicates tall stature threshold (CDC). Probands with a short stature HPO in GA4K are indicated by a red triangle if observed height value passes CDC threshold for short stature, or a red cross if not passing this threshold. Probands with tall stature HPO are indicated by a blue triangle if observed height value passes CDC threshold for tall stature, or a blue cross if not passing this threshold. Gray indicates control (no short or tall stature HPO terms present in GA4K). Pearson correlation coefficient and P-value is indicated in figure.

We next assessed the overlap of clinical diagnostic or candidate (VUS) rare disease variants in genes that are also present in significantly associated PGS. Across all PGS genes previously linked to the associated HPO rare disease phenotype in OMIM or Orphanet (HPO gene associations; see Data Availability), we observed a 6-fold increase in cases having a diagnostic or candidate rare disease variant in a PGS gene compared to controls (Fisher’s exact test, odds ratio = 5.99 [CI 5.67–6.33], P < 1×10^-16) (Figure 5A). We further implemented a method for ranking PGS genes to define a set of core/key genes where a rare variant has the potential to exert a relatively larger effect on disease risk, as postulated in the omnigenic model of complex traits¹². Previous studies have shown that GWAS effect sizes tend to be larger for genes overlapping known disease-matched monogenic disorder genes¹⁰; given this finding, we ranked genes in each PGS using summarized effect weights for variants in a PGS that are within or proximal (+/-10 Kb) to any protein coding gene (Methods). We observed for case probands an increasing enrichment in overlap in clinical diagnostic or candidate variants in genes with higher rank in the PGS. For example, for the top 0.01% effect rank of PGS genes we observed a 20-fold increase in clinical diagnostic or candidate variant overlap in cases compared to controls (Fisher’s exact test, odds ratio = 20.33 [CI 10.67–37.29], P = 2×10^-15). This effect was stronger when increasing the stringency of HPO-PGS significance threshold (FDR 10%: odds ratio = 45.59 [CI 19.56–109.49], P < 1×10^-16; FDR 5%: odds ratio = 49.85 [CI 19.78–136.06], P < 1×10^-16) (Figure S5).

Figure 5.

Quantifying overlap between rare disease variant annotations and putative core/key genes in associated PGS in cases and controls. A. Enrichment for HPO cases versus controls with a diagnostic or candidate (VUS) rare disease variant in an associated PGS gene with indicated effect rank percentile in PGS. B. Proportion of cases (obesity cohort) and controls with rare variants prioritized in to ACMG severity categories 2 or 3 in an obesity eQTS gene as a function of individual liability for a significantly associated polygenic score for body mass index (BMI).

Using an orthogonal metric to define putative trait-relevant core/key PGS genes, we focused on trait-matched expression quantitative trait scores (eQTS) genes (eQTLGen Consortium, see Data Availability). eQTS measures the correlation between the expression of a given gene and a polygenic score, aggregating regulatory effects on expression for both cis-and trans-acting variants¹³. The genes with the strongest correlation for a given trait are then defined as putative core/key genes. We assessed the likelihood of a proband harboring a potential large-effect rare variant (ACMG severity category 2 or 3; Methods) in an eQTS gene across increasingly more stringent PGS liability thresholds. Focusing on a subset of the HPO obesity case/control cohort with clinical sequencing data available (N = 1,168), we identified candidate rare variants in a set of 34 eQTS genes significantly associated with obesity or extreme BMI (Methods). Across the full cohort, 28 of 61 (46%) cases harbored a potential large-effect rare variant in a trait-relevant eQTS gene and 463 of 1,107 (42%) controls (P = 0.59).

When then considering individual polygenic liability for a significantly associated PGS trait (PGS = “body mass index”) in addition to rare variant burden, we observed an increasing enrichment in the proportion of cases with rare variants in trait-relevant eQTS genes as a function of trait-relevant polygenic liability, but a depletion in controls (logistic regression, interaction P = 5×10⁻⁰⁴). At an outlier PGS Z-score threshold of >=2, 10 of 14 (71%) cases were found to have a potentially impactful rare variant, and 6 of 24 (25%) controls (Figure 5B). Increasing the PGS Z-score outlier threshold further (PGS Z-score >= 2.5), we observed 4 of 5 (80%) cases with potentially impactful rare variants in trait-relevant eQTS genes and 0 of 6 controls. We repeated this analysis using the same patient cohort across six eQTS gene sets for non-matched diseases or traits and observed no significant differences between cases and controls (Figure S6). Of the 34 obesity-associated eQTS genes used in this analysis, only one has been previously linked to a rare Mendelian obesity disorder in OMIM or Orphanet (HPO gene associations; see Data Availability). These findings suggest a potential framework for the discovery of novel rare disease genes integrating rare disease-associated PGS liability, PGS gene effect estimates, and clinical rare variant annotations.

Discussion

Recent studies have shown that integrating PGS information can help in understanding the genetic basis of rare diseases^3,14–17. Here, we implemented a pipeline for the systematic assessment of PGS associations in a large-scale, phenotypically diverse pediatric rare disease cohort, mapping patient phenotypes to over 500 complex trait PGSs. We demonstrated how individual polygenic liability is enriched in proband carriers of inherited, clinically prioritized variants of unknown significance (VUS) compared to unaffected carrier parents. Furthermore, using two separate metrics to define putative core/key PGS genes, we identified strong overlap between clinical diagnostic or candidate rare disease genes and those with large estimated effects in significantly associated PGS. Finally, we proposed a framework for the prioritization of candidate variants in novel rare disease genes.

The accurate classification of VUS pathogenicity is an ongoing and growing problem in the field¹⁸. Our findings provide evidence that integrating information from proband and parent polygenic backgrounds for associated complex traits could be a useful addition to variant annotation workflows. We focused on a set of inherited variants with high suspicion of pathogenicity resulting from expert clinical geneticist review and suspected to cause disease through autosomal dominance with partial penetrance. An explanation for why some phenotypes comprising the nominated syndrome are not observed in unaffected carrier parents could be helpful in the further annotation of these candidate diagnostic variants. For the set of inherited VUS studied here, observed differences in polygenic background between probands and unaffected carrier parents might explain variable presentation of the candidate rare disorder. Additionally, with relevance to ongoing efforts to experimentally catalog the functional impact of VUS and other possible disease variants using multiplexed assays of variant effect (MAVEs) or similar platforms^19,20, our results suggest that the utility of these approaches might be maximized when using patient-specific cell lines to reflect potentially relevant polygenic background associated with rare disease presentation.

One key limitation of our approach is the focus on the subset of GA4K individuals with European ancestry. This constraint reflects bias toward this ancestry group in available GWAS and PGS training cohorts¹¹, as well as in the GA4K study cohort itself. Future updates to our work will benefit from ongoing efforts in multi-ancestry PGS development²¹ and targeted approaches for equitable outreach and enrollment in GA4K²². Furthermore, our approach associates single standardized phenotype terms (HPO) with PGS, whereas our results showing a mismatch in growth disorders HPOs and height growth chart Z-scores from EHR data suggest an opportunity for improved rare disease phenotyping through integrating multiple data modalities.

Combined, our study suggests that future advances in the diagnosis of rare diseases will be enabled by considering the full frequency spectrum of genetic variation present in patient cases.

Methods

GA4K genotype imputation

Genotyping of the full GA4K study cohort was performed using the Avera Global Screening Array (24v1-0_A1, stranded). Genotyping array variant calls in hg38 build for individuals with genotyping call rate >= 90% were then used as input for imputation using the TOPMed imputation server with the r2 (1.0.0) reference panel²³. From the full set of 548,291 measured genotypes, 533,837 (97.43%) overlapped with the reference set.

GA4K ancestry inference

Principal component analysis was performed on a subset of LD and MAF pruned variants from chromosome 1 using plink (version 2)²⁴ (“indep-pairwise” flag, values: 1000, 50, 0.05; “maf” flag, value: 0.01) and flashpcaR (version 2.0.1)²⁵. Using a subset of known ancestry labels, the full GA4K cohort was filtered to define a European (EUR) ancestry cohort composed of individuals with known EUR ancestry and those inferred from manual inspection of a principal components plot (PC1 > 0, PC2 < 0.005).

Filtering PGS and calculating individual scores

Individual-level PGSs for GA4K participants were calculated using PGS weights obtained from The Polygenic Score (PGS) Catalog¹¹ (see Data Availability). From a starting set of 3,334 PGS downloaded from PGS Catalog (February 2023), PGS measuring the same trait or disease were filtered by retaining the PGS with the greatest number of variants. Further, PGS measuring lifestyle phenotypes such as diet were removed. PGS containing duplicated variants (more than one weight listed for the same variant) were removed. PGS variants were converted to hg38 coordinates using dbSNP²⁶ (version 155) where necessary. 1,151 PGS remained after applying these filtering criteria. PGS were linked to category labels using the PGS ontology ID and trait category annotations available from PGS Catalog. PGS variants were restricted to autosomes only and variants mapping to the HLA region were removed. PGSs were calculated using plink (version 1.9)²⁴ (“sum” flag) on all variants available in the GA4K imputed genotype callset with R²>=0.8 (“exclude-if-info” flag). PGS scores were converted to Z-scores within each PGS in the full GA4K EUR ancestry cohort (proband and other family members, N = 7,436). Individuals with an extreme outlier PGS Z-score (abs(PGS Z-score)>=10) in one or more PGS were removed from further analysis. This cohort was then filtered for probands only (N = 2,374). The remaining samples were retained for analysis using mother, father, and/or sibling PGS data.

Identifying significant HPO-PGS associations

PGSs were associated with rare disease case sub-cohorts using HPO IDs for any ID with a minimum proband count 5 among the 2,374 proband study cohort (N = 574 sub-cohorts). For each HPO sub-cohort and PGS (N HPO = 574; N PRS = 1,151; N = 660,674 pairwise comparisons) a logistic regression model was used with HPO case/control status as the response variable and PGS Z-score, first five principal components of ancestry (see “GA4K inferred ancestry”), and sex as predictor variables. Model fit was assessed with Nagelkerke’s R squared using the NagelkerkeR2 function in the fmsb package (version 0.7.5) in R. An empirical P-value was calculated for each pairwise comparison using a simulated null distribution of logistic regression abs(Z-statistics) across 10,000 permutations of random case/control label reassignment (retaining relative HPO case/control counts) within each HPO-PGS pair (approximately 6.6B total model calculations). Predictor variables remained as described above. Empirical P-values were generated in R using the empPvals function available in the qvalue package (version 2.30.0)²⁷. The empirical P-value is defined here as the fraction of null logistic regression abs(Z-statistics) that match or exceed the observed abs(Z-statistic) (plus an integer constant) for a given HPO-PGS pair. In this way, the minimum possible empirical P-value for a given HPO-PGS pair in the present analysis is 1×10^-04 (i.e., the observed statistic is not matched or exceeded by any of the 10,000 null statistics from permutations). The expectation of this procedure is that the distribution of null empirical P-values follows a uniform distribution²⁸; however, when combining empirical P-values across all HPO-PGS pairs it was observed that this distribution was not uniform (Kolmogorov-Smirnov test) and that the estimate of null statistics was affected by the relative sample size of each HPO sub-cohort. Therefore, the false discovery rate (FDR) was calculated within each HPO case sub-cohort separately (N = 574 cohorts) to conform with the expected uniform distribution of empirical P-values.

Linking PGS variants to genes and effect size ranking

PGS variants within or proximal (+/-10 Kb) to any protein coding gene were identified using bedtools (version 2.29.1)²⁹ (“window” flag; value: 10,000). Gene coordinates and annotations were downloaded from GENCODE (file: gencode.v26.GRCh38.genes.gtf) (see Data Availability). For each PGS, genes were ranked by effect size by selecting the linked variant (i.e. within the gene body or +/-10 Kb) with the maximum absolute PGS effect weight.

GA4K clinical sequencing and phenotypic data

Clinical whole-exome (WES) or whole-genome (WGS) variant calls, standardized phenotype codes (Human Phenotype Ontology (HPO)³⁰), and diagnostic status were obtained from the GA4K study repository as previously described¹. HPO terms were summarized for visualization where indicated using ontology parent terms specified in HPO (see Data Availability), extracted using the get_ancestors function available in the ontologyIndex package (version 2.11)³¹ in R.

For the analysis of probands with clinical diagnostic status of variant of unknown significance (VUS), probands were selected if their PGS-associated HPO(s) matched any of the disease phenotypes for candidate syndromes returned by clinical annotation performed by clinical geneticists. This annotation pipeline prioritizes disease variants and summarizes matching phenotypes by aggregating across multiple knowledge bases such as OMIM, ACMG, and ClinVar. VUS cases were restricted to single inherited variants (i.e. VUS is present in one parent), autosomal only, annotated as autosomal dominant with partial penetrance, and where associated PGSs were available for the full trio. Compound heterozygous VUS events were removed. Parent VUS zygosity was obtained from clinical WES/WGS available in the GA4K study repository. Carrier parents were confirmed to be unaffected for the matching VUS phenotype(s) by clinical geneticist chart review.

Height and weight observations, including measure date/time and patient ancestry, were obtained from the electronic medical record at the GA4K study site at Children’s Mercy Hospital. Observations were filtered for “White” ancestry label. The R package cdcanthro (version 0.1.1) was used to calculate age and sex adjusted Z-scores for height and weight using CDC guidelines. A larger patient cohort (N = 3,073) was used in order to maximize the number of age-matched observations available and therefore the accuracy of growth Z-scores. Further, BMI Z-scores were calculated using the formula Weight/(Height/100)^2. The average (median) Z-score for each measurement was computed for individuals with more than one observation time point.

Assessing rare variant burden in large-effect PGS genes

To assess the landscape of rare variants in putative core/key PGS genes (eQTS genes) for obesity, rare variant impact annotations were obtained using Variant Effect Predictor (VEP) (version 98_38)³² and allele frequencies from gnomAD (version 3)³³. Rare variants were defined as those with gnomAD MAF < 1% (and including variants absent in gnomAD) and were further categorized using ACMG variant severity guidelines³⁴. Variants annotated in category 2 or 3 were retained. Rare variants in category 2 are defined as: nonsense; disruption of stop; loss of initiation; splice junction; donor/acceptor (AG/GT); frameshift; whole transcript deletion. Category 3 variants are defined as: missense; in-frame in/del including whole exon; intronic or synonymous variant possibly affecting splicing (in polypyrimidine tract, five_prime_exonic, five_prime_flank, three_prime_exonic, five_prime_intronic, three_prime_flank); any variant in a mitochondrial gene.

eQTS genes (FDR < 5%) were obtained from eQTLGen Consortium (see Data Availability), and subset using the trait column for the terms “obesity” or “extreme_bmi”. To create a set of non-relevant traits, the trait column was subset for the terms “asthma”, “celiac disease”, “juvenile idiopathic arthritis”, “primary biliary cirrhosis”, “educational attainment”, or “coronary artery disease”. In this analysis, the case cohort was defined as those with a HPO term for “obesity” (HP:0001513), “increased body weight” (HP:0004324), and/or “overweight” (HP:0025502), as well as having a BMI growth chart Z-score > 2. Controls were defined as any sample without any of these HPO terms, as well as having a BMI growth chart Z-score of < 1.

All statistical analyses were performed in R (version 4.2.1). Figures were generated using ggplot2 (version 3.4.0)³⁵.

Data Availability

GA4K data is available via dbGAP using accession number phs002206.v4.p1 and AnVIL at https://anvilproject.org/data/studies/phs002206. Polygenic scores are available from PGS Catalog at https://www.pgscatalog.org/. Gene annotations are available from GENCODE (version 26) at https://www.gencodegenes.org/human/release_26.html. eQTS genes are available from eQTLGen Consortium at https://www.eqtlgen.org/eqts.html. Human Phenotype Ontology (HPO) in OBO format is available at https://hpo.jax.org/app/data/ontology. HPO phenotype-to-gene annotations are available at https://hpo.jax.org/app/data/annotations.

Ethical declarations

The study was approved by the Children’s Mercy Institutional Review Board (IRB) (Study # 11120514). Informed written consent was obtained from all participants prior to study inclusion.

Conflicts of interest

None declared.

Acknowledgements

C.S. is supported by NIH grants R35GM146966 and R21HG012422. The work of GA4K was made possible in part by generous gifts to Children’s Mercy Research Institute at Children’s Mercy Kansas City.

References

1.↵
Cohen, A. S. A. et al. Genomic answers for children: Dynamic analyses of >1000 pediatric rare disease genomes. Genet. Med. 24, 1336–1348 (2022).
OpenUrl
2.↵
Cheung, W. A. et al. Direct haplotype-resolved 5-base HiFi sequencing for genome-wide profiling of hypermethylation outliers in a rare disease cohort. Nat. Commun. 14, 3090 (2023).
OpenUrl
3.↵
Niemi, M. E. K. et al. Common genetic variants contribute to risk of rare severe neurodevelopmental disorders. Nature 562, 268–271 (2018).
OpenUrl
4.↵
Fahed, A. C. et al. Polygenic background modifies penetrance of monogenic variants for tier 1 genomic conditions. Nat. Commun. 11, 3635 (2020).
OpenUrl PubMed
5.
Schroeder, P. et al. Rare variant association analysis in 51,256 type 2 diabetes cases and 370,487 controls informs the spectrum of pathogenicity of monogenic diabetes genes. medRxiv (2023) doi:10.1101/2023.09.28.23296244.
OpenUrl Abstract/FREE Full Text
6.↵
Goodrich, J. K. et al. Determinants of penetrance and variable expressivity in monogenic metabolic conditions across 77,184 exomes. Nat. Commun. 12, 3505 (2021).
OpenUrl
7.↵
Chami, N., Preuss, M., Walker, R. W., Moscati, A. & Loos, R. J. F. The role of polygenic susceptibility to obesity among carriers of pathogenic mutations in MC4R in the UK Biobank population. PLoS Med. 17, e1003196 (2020).
OpenUrl
8.↵
Davies, R. W. et al. Using common genetic variation to examine phenotypic expression and risk prediction in 22q11.2 deletion syndrome. Nat. Med. 26, 1912–1918 (2020).
OpenUrl
9.↵
Vuckovic, D. et al. The Polygenic and Monogenic Basis of Blood Traits and Diseases. Cell 182, 1214–1231.e11 (2020).
OpenUrl CrossRef PubMed
10.↵
Freund, M. K. et al. Phenotype-Specific Enrichment of Mendelian Disorder Genes near GWAS Regions across 62 Complex Traits. Am. J. Hum. Genet. 103, 535–552 (2018).
OpenUrl CrossRef PubMed
11.↵
Lambert, S. A. et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat. Genet. 53, 420–425 (2021).
OpenUrl
12.↵
Liu, X., Li, Y. I. & Pritchard, J. K. Trans Effects on Gene Expression Can Drive Omnigenic Inheritance. Cell 177, 1022–1034.e6 (2019).
OpenUrl CrossRef PubMed
13.↵
Võsa, U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310 (2021).
OpenUrl CrossRef
14.↵
Wigdor, E. M. et al. The female protective effect against autism spectrum disorder. Cell Genom 2, 100134 (2022).
15.
Michaud, V. et al. The contribution of common regulatory and protein-coding TYR variants to the genetic architecture of albinism. Nat. Commun. 13, 3939 (2022).
OpenUrl CrossRef
16.
Alver, M. et al. Contribution of schizophrenia polygenic burden to longitudinal phenotypic variance in 22q11.2 deletion syndrome. Mol. Psychiatry 27, 4191–4200 (2022).
OpenUrl
17.↵
Škorić-Milosavljević, D. et al. Common Genetic Variants Contribute to Risk of Transposition of the Great Arteries. Circ. Res. 130, 166–180 (2022).
OpenUrl CrossRef
18.↵
Starita, L. M. et al. Variant Interpretation: Functional Assays to the Rescue. Am. J. Hum. Genet. 101, 315–325 (2017).
OpenUrl CrossRef PubMed
19.↵
Radford, E. J. et al. Saturation genome editing of DDX3X clarifies pathogenicity of germline and somatic variation. Nat. Commun. 14, 7702 (2023).
OpenUrl
20.↵
Findlay, G. M. et al. Accurate classification of BRCA1 variants with saturation genome editing. Nature 562, 217–222 (2018).
OpenUrl CrossRef PubMed
21.↵
Fatumo, S. et al. A roadmap to increase diversity in genomic studies. Nat. Med. 28, 243– 250 (2022).
OpenUrl
22.↵
Kane, N. J. et al. Committing to genomic answers for all kids: Evaluating inequity in genomic research enrollment. Genet. Med. 25, 100895 (2023).
23.↵
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).
OpenUrl CrossRef PubMed
24.↵
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
OpenUrl CrossRef PubMed
25.↵
Abraham, G., Qiu, Y. & Inouye, M. FlashPCA2: principal component analysis of Biobank-scale genotype datasets. Bioinformatics 33, 2776–2778 (2017).
OpenUrl CrossRef PubMed
26.↵
Sherry, S. T., Ward, M. & Sirotkin, K. dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res. 9, 677–679 (1999).
OpenUrl FREE Full Text
27.↵
Storey, J. D., Xiao, W., Leek, J. T., Tompkins, R. G. & Davis, R. W. Significance analysis of time course microarray experiments. Proc. Natl. Acad. Sci. U. S. A. 102, 12837–12842 (2005).
OpenUrl Abstract/FREE Full Text
28.↵
Storey, J. D., Taylor, J. E. & Siegmund, D. Strong Control, Conservative Point Estimation and Simultaneous Conservative Consistency of False Discovery Rates: A Unified Approach. J. R. Stat. Soc. Series B Stat. Methodol. 66, 187–205 (2003).
OpenUrl
29.↵
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
OpenUrl CrossRef PubMed Web of Science
30.↵
Köhler, S. et al. The Human Phenotype Ontology in 2021. Nucleic Acids Res. 49, D1207– D1217 (2021).
OpenUrl CrossRef PubMed
31.↵
Greene, D., Richardson, S. & Turro, E. ontologyX: a suite of R packages for working with ontological data. Bioinformatics 33, 1104–1106 (2017).
OpenUrl
32.↵
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).
33.↵
Chen, S. et al. A genome-wide mutational constraint map quantified from variation in 76,156 human genomes. bioRxiv 2022.03.20.485034 (2022) doi:10.1101/2022.03.20.485034.
OpenUrl Abstract/FREE Full Text
34.↵
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).
OpenUrl CrossRef PubMed
35.↵
Wickham, H. ggplot2: Elegant Graphics for Data Analysis. (Springer International Publishing, 2016).

View the discussion thread.

Posted January 11, 2024.

Download PDF

Supplementary Material

Citation Tools

Subject Area

Genetic and Genomic Medicine

Subject Areas

All Articles

Addiction Medicine (349)
Allergy and Immunology (668)
Allergy and Immunology (668)
Anesthesia (181)
Cardiovascular Medicine (2648)
Dentistry and Oral Medicine (316)
Dermatology (223)
Emergency Medicine (399)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
Epidemiology (12228)
Forensic Medicine (10)
Gastroenterology (759)
Genetic and Genomic Medicine (4103)
Geriatric Medicine (387)
Health Economics (680)
Health Informatics (2657)
Health Policy (1005)
Health Systems and Quality Improvement (985)
Hematology (363)
HIV/AIDS (851)
Infectious Diseases (except HIV/AIDS) (13695)
Intensive Care and Critical Care Medicine (797)
Medical Education (399)
Medical Ethics (109)
Nephrology (436)
Neurology (3882)
Nursing (209)
Nutrition (577)
Obstetrics and Gynecology (739)
Occupational and Environmental Health (695)
Oncology (2030)
Ophthalmology (585)
Orthopedics (240)
Otolaryngology (306)
Pain Medicine (250)
Palliative Medicine (75)
Pathology (473)
Pediatrics (1115)
Pharmacology and Therapeutics (466)
Primary Care Research (452)
Psychiatry and Clinical Psychology (3432)
Public and Global Health (6527)
Radiology and Imaging (1403)
Rehabilitation Medicine and Physical Therapy (814)
Respiratory Medicine (871)
Rheumatology (409)
Sexual and Reproductive Health (410)
Sports Medicine (342)
Surgery (448)
Toxicology (53)
Transplantation (185)
Urology (165)

[1] 1.↵
Cohen, A. S. A. et al. Genomic answers for children: Dynamic analyses of >1000 pediatric rare disease genomes. Genet. Med. 24, 1336–1348 (2022).
OpenUrl

[2] 2.↵
Cheung, W. A. et al. Direct haplotype-resolved 5-base HiFi sequencing for genome-wide profiling of hypermethylation outliers in a rare disease cohort. Nat. Commun. 14, 3090 (2023).
OpenUrl

[3] 3.↵
Niemi, M. E. K. et al. Common genetic variants contribute to risk of rare severe neurodevelopmental disorders. Nature 562, 268–271 (2018).
OpenUrl

[4] 4.↵
Fahed, A. C. et al. Polygenic background modifies penetrance of monogenic variants for tier 1 genomic conditions. Nat. Commun. 11, 3635 (2020).
OpenUrl PubMed

[5] 5.
Schroeder, P. et al. Rare variant association analysis in 51,256 type 2 diabetes cases and 370,487 controls informs the spectrum of pathogenicity of monogenic diabetes genes. medRxiv (2023) doi:10.1101/2023.09.28.23296244.
OpenUrl Abstract/FREE Full Text

[6] 6.↵
Goodrich, J. K. et al. Determinants of penetrance and variable expressivity in monogenic metabolic conditions across 77,184 exomes. Nat. Commun. 12, 3505 (2021).
OpenUrl

[7] 7.↵
Chami, N., Preuss, M., Walker, R. W., Moscati, A. & Loos, R. J. F. The role of polygenic susceptibility to obesity among carriers of pathogenic mutations in MC4R in the UK Biobank population. PLoS Med. 17, e1003196 (2020).
OpenUrl

[8] 8.↵
Davies, R. W. et al. Using common genetic variation to examine phenotypic expression and risk prediction in 22q11.2 deletion syndrome. Nat. Med. 26, 1912–1918 (2020).
OpenUrl

[9] 9.↵
Vuckovic, D. et al. The Polygenic and Monogenic Basis of Blood Traits and Diseases. Cell 182, 1214–1231.e11 (2020).
OpenUrl CrossRef PubMed

[10] 10.↵
Freund, M. K. et al. Phenotype-Specific Enrichment of Mendelian Disorder Genes near GWAS Regions across 62 Complex Traits. Am. J. Hum. Genet. 103, 535–552 (2018).
OpenUrl CrossRef PubMed

[11] 11.↵
Lambert, S. A. et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat. Genet. 53, 420–425 (2021).
OpenUrl

[12] 12.↵
Liu, X., Li, Y. I. & Pritchard, J. K. Trans Effects on Gene Expression Can Drive Omnigenic Inheritance. Cell 177, 1022–1034.e6 (2019).
OpenUrl CrossRef PubMed

[13] 13.↵
Võsa, U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310 (2021).
OpenUrl CrossRef

[14] 14.↵
Wigdor, E. M. et al. The female protective effect against autism spectrum disorder. Cell Genom 2, 100134 (2022).

[15] 15.
Michaud, V. et al. The contribution of common regulatory and protein-coding TYR variants to the genetic architecture of albinism. Nat. Commun. 13, 3939 (2022).
OpenUrl CrossRef

[16] 16.
Alver, M. et al. Contribution of schizophrenia polygenic burden to longitudinal phenotypic variance in 22q11.2 deletion syndrome. Mol. Psychiatry 27, 4191–4200 (2022).
OpenUrl

[17] 17.↵
Škorić-Milosavljević, D. et al. Common Genetic Variants Contribute to Risk of Transposition of the Great Arteries. Circ. Res. 130, 166–180 (2022).
OpenUrl CrossRef

[18] 18.↵
Starita, L. M. et al. Variant Interpretation: Functional Assays to the Rescue. Am. J. Hum. Genet. 101, 315–325 (2017).
OpenUrl CrossRef PubMed

[19] 19.↵
Radford, E. J. et al. Saturation genome editing of DDX3X clarifies pathogenicity of germline and somatic variation. Nat. Commun. 14, 7702 (2023).
OpenUrl

[20] 20.↵
Findlay, G. M. et al. Accurate classification of BRCA1 variants with saturation genome editing. Nature 562, 217–222 (2018).
OpenUrl CrossRef PubMed

[21] 21.↵
Fatumo, S. et al. A roadmap to increase diversity in genomic studies. Nat. Med. 28, 243– 250 (2022).
OpenUrl

[22] 22.↵
Kane, N. J. et al. Committing to genomic answers for all kids: Evaluating inequity in genomic research enrollment. Genet. Med. 25, 100895 (2023).

[23] 23.↵
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).
OpenUrl CrossRef PubMed

[24] 24.↵
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
OpenUrl CrossRef PubMed

[25] 25.↵
Abraham, G., Qiu, Y. & Inouye, M. FlashPCA2: principal component analysis of Biobank-scale genotype datasets. Bioinformatics 33, 2776–2778 (2017).
OpenUrl CrossRef PubMed

[26] 26.↵
Sherry, S. T., Ward, M. & Sirotkin, K. dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res. 9, 677–679 (1999).
OpenUrl FREE Full Text

[27] 27.↵
Storey, J. D., Xiao, W., Leek, J. T., Tompkins, R. G. & Davis, R. W. Significance analysis of time course microarray experiments. Proc. Natl. Acad. Sci. U. S. A. 102, 12837–12842 (2005).
OpenUrl Abstract/FREE Full Text

[28] 28.↵
Storey, J. D., Taylor, J. E. & Siegmund, D. Strong Control, Conservative Point Estimation and Simultaneous Conservative Consistency of False Discovery Rates: A Unified Approach. J. R. Stat. Soc. Series B Stat. Methodol. 66, 187–205 (2003).
OpenUrl

[29] 29.↵
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
OpenUrl CrossRef PubMed Web of Science

[30] 30.↵
Köhler, S. et al. The Human Phenotype Ontology in 2021. Nucleic Acids Res. 49, D1207– D1217 (2021).
OpenUrl CrossRef PubMed

[31] 31.↵
Greene, D., Richardson, S. & Turro, E. ontologyX: a suite of R packages for working with ontological data. Bioinformatics 33, 1104–1106 (2017).
OpenUrl

[32] 32.↵
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).

[33] 33.↵
Chen, S. et al. A genome-wide mutational constraint map quantified from variation in 76,156 human genomes. bioRxiv 2022.03.20.485034 (2022) doi:10.1101/2022.03.20.485034.
OpenUrl Abstract/FREE Full Text

[34] 34.↵
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).
OpenUrl CrossRef PubMed

[35] 35.↵
Wickham, H. ggplot2: Elegant Graphics for Data Analysis. (Springer International Publishing, 2016).