Abstract
While genome sequencing has transformed medicine by elucidating the genetic underpinnings of both rare and common complex disorders, its utility to predict clinical outcomes remains understudied. Here, we used artificial intelligence (AI) technologies to explore the predictive value of genome sequencing in forecasting clinical outcomes following surgery for congenital heart defects (CHD). We report results for a cohort of 2,253 CHD patients from the Pediatric Cardiac Genomics Consortium with a broad range of complex heart defects, pre- and post-operative clinical variables and exome sequencing. Damaging genotypes in chromatin-modifying and cilia-related genes were associated with an elevated risk of adverse post-operative outcomes, including mortality, cardiac arrest and prolonged mechanical ventilation. The impact of damaging genotypes was further amplified in the context of specific CHD phenotypes, surgical complexity and extra-cardiac anomalies. The absence of a damaging genotype in chromatin-modifying and cilia-related genes was also informative, reducing the risk for adverse postoperative outcomes. Thus, genome sequencing enriches the ability to forecast outcomes following congenital cardiac surgery.
Introduction
Congenital heart defects (CHD) represent a complex class of often life-threatening disorders that affect more than 40,000 newborns in the U.S. annually. The prevalence of CHD is approximately 1 per 100 live births, with an incidence that varies according to the specific CHD lesion1–3. The genetic architecture of CHD has been the focus of several large-scale sequencing efforts4–9, demonstrating that the genetic landscape of syndromic and sporadic CHD differ, with sporadic forms characterized by considerable locus and allelic heterogeneity7. More recently, work by the National Heart, Lung and Blood Institute (NHLBI)-funded Pediatric Cardiac Genomics Consortium (PCGC) has shown that dominantly and recessively inherited forms of CHD have distinct genetic and phenotypic landscapes, whereby dominant forms of CHD are significantly enriched for damaging variants in chromatin-modifying genes, while recessive forms are enriched for damaging variants in cilia-related biallelic genotypes and heterotaxy phenotypes4,5,8,9.
Recent work has also demonstrated the value of genetic testing for outcomes prediction for specific types of CHD and within specific clinical contexts10–14. Broader investigations, however, have faced difficulties in assaying genetic contributions across multiple CHD phenotypes and clinical contexts, in part due to the widely varying severity of CHD lesions and the complex medical and surgical interventions necessary for survival. Here, we demonstrate that condensing heterogenous CHD phenotypes into five major clinically relevant phenotypic categories using anatomic descriptors15 renders these data amenable for outcomes analyses. We also show that the high allelic and locus heterogeneity characteristic of CHD can be overcome using an artificial intelligence (AI) genome interpretation tool16, followed by categorization of damaging genotypes into molecular pathways or gene categories. This two-pronged approach of phenotypic and genotypic classification, when combined with probabilistic graphical models, enables clinically relevant and highly personalized risk estimates in patients undergoing congenital cardiac surgery.
Methods
Human subjects
All patients were diagnosed, phenotyped, and recruited by PCGC centers and participating regional hospitals into the PCGC Congenital Heart Disease Network Study (CHD GENES: ClinicalTrials.gov identifier NCT01196182; [https://clinicaltrials.gov/]). Informed consent was obtained from all participants or the participants’ guardians. Approval for human subjects research was obtained by the institutional review boards of participating centers, including Boston’s Children’s Hospital, Brigham and Women’s Hospital, Great Ormond Street Hospital, Children’s Hospital of Los Angeles, Children’s Hospital of Philadelphia, Columbia University Medical Center, Icahn School of Medicine at Mount Sinai, Rochester School of Medicine and Dentistry, Steven and Alexandra Cohen Children’s Medical Center of New York, Lucile Packard Children’s Hospital Stanford, University of California-San Francisco, University of Utah, and Yale School of Medicine. Automated CHD phenotype classification was performed on 14,765 PCGC participants. A subset of these participants who had both exome sequencing and perioperative data (2,253) was used for network analyses.
Clinical phenotypes
Cardiac diagnoses were obtained from review of echocardiogram, cardiac MRI, catheterization, and operative reports at the time of enrollment into the PCGC4,5. Detailed cardiac diagnoses for each patient were coded using the Fyler system15. Extra-cardiac anomalies (ECAs) were identified at the time of PCGC enrollment4,5 (Supplemental Table 1). Any structural anomaly that was not acquired was classified as an extra cardiac anomaly (ECA).
Post-operative variables
For patients undergoing open heart surgery, surgical and hospitalization data were obtained from participating centers using the local data collected for submission to the Society of Thoracic Surgeons Congenital Heart Surgery Database (STS-CHSD)17. A total of 59 surgical complication variables were extracted for analysis. The size of the final data set was constrained to 2,253 patients, such that all patients had WES and surgical variables had no more than 10% missing data. Most patients had multiple cardiac surgeries. A patient was scored as having an adverse event or surgical complication (e.g., prolonged mechanical ventilation) if that event occurred for any surgery at any age (Supplemental Table 2).
Surgical complexity
Surgical complexity is a well-known driver of mortality and morbidity. In response, the STS-European Association for Cardio-Thoracic Surgery (STAT) has created risk assessment categories in which procedures are grouped based on similar mortality rates18. STAT categories range from 1 to 5, with STAT1 representing the procedures with the lowest mortality rates and STAT5 representing the procedures with the highest mortality rates.
CHD classification
The PCGC has classified cardiac diagnoses for over 14,000 CHD probands using the Fyler coding system, which describes the congenitally malformed heart using a vocabulary of more than 3,000 possible phenotypic descriptors15. While this system allows for highly granular descriptions of heart defects, we hypothesized that condensing these terms into a few clinically relevant phenotypic categories might render them more tractable for outcomes analyses. Thus, we sought to automate cardiac phenotype classification across the entire PCGC cohort, assigning each patient to a single category. To do so, we used five major cardiac categories derived from a previous PCGC study5: left ventricular outflow tract obstructions (LVO), laterality and heterotaxy defects (HTX), atrioventricular canal defects (AVC), conotruncal defects (CTD), and other defects (OTH), which includes simple atrial septal defects and more complex heart defects not assigned to the other four categories5,15. Each participant was assigned uniquely to one of the five phenotypic categories.
A gradient-boosted decision tree model was built to automatically classify PCGC probands (14,765) into one of these five CHD categories (see Supplemental Methods, Supplemental Figures 1, 2). Model learning and classification was performed using an ensemble-based method using the XGBoost library (v1.5)19. We created an analysis toolkit to streamline XGBoost training, grid-based parameter optimization, and performance evaluation. The truth set for training the classifier included 3,000 CHD patients, 2,752 PCGC patients previously assigned into the five CHD categories5 and 248 randomly selected PCGC patients that were manually reviewed and assigned to a CHD phenotype category. A gradient-boosted probabilistic patient classifier was built with XGBoost19 using the 3,000 patients and 698 Fyler features (Supplemental Table 3). Model training was performed with five-fold cross-validation and replacement subsampling (Supplemental Table 4). Model accuracy was assessed by comparing the patient’s known phenotype category label derived from the literature5 to its predicted phenotype category label (Supplemental Figure 3, Supplemental Table 5). Single-class prediction accuracy for the training data was higher for HTX (98.9%), AVC (97.8%), and LVO (97.7%) than for CTD (95.3%) and OTH (91.9%), where a low level of ambiguity occurred. Overall classification accuracy was 97.7% with a specificity of 99.3%, and sensitivity of 97.7% (Supplemental Tables 6). We then applied the trained classifier to 14,765 PCGC CHD patients with Fyler descriptors (Supplemental Tables 7, 8). Final classification of the 3,000 training patients was identical to the original training predictions. The five phenotype categories remained generally proportional between the training data and the full data set, with a maximum observed difference of 4.2% for CTD patients.
AI-based scoring of predicted damaging genetic variants
AI-based identification of candidate disease-causing genotypes was performed using Fabric GEM16 (Fabric Genomics, Oakland, CA). GEM incorporates Human Phenotype Ontology (HPO) terms, sex, genotype frequency (gnomAD), evolutionary conservation, Online Mendelian Inheritance in Man (OMIM), GnomAD, and ClinVar information in a probabilistic AI framework to identify the most likely genetic variant or genotype that explains the patient’s disease phenotype. Because WES are difficult substrates for CNV calling, we restricted our analyses to SNVs and short indels. HPO terms utilized in the GEM analyses were based on each patient’s Fyler phenotypes, which were mapped to HPO terms using the Clinithink software package (Clinithink, London).
GEM’s gene scores are log10 transformed Bayes factors20 that summarize the relative support for the hypothesis that the prioritized genotype damages the gene in which it resides and explains the patient’s phenotype versus the hypothesis that the variant neither damages the gene nor explains the patient’s phenotype. We used a stringent GEM score of ≥ 1.0 to represent a likely pathogenic genotype. A recent genomic analysis of critically ill newborns16 showed that a GEM score of ≥ 1.0 identified 90% of all true positive damaging variants, with a median of two candidate variants per patient16. Gene penetrance for GEM calculations was set to 0.95 to enforce strict consideration of known dominant and recessive disorders. For downstream analyses, damaging genetic variants were classified as de novo, dominant, or recessive/biallelic variants based on their inheritance pattern in trios. Dominant and de novo damaging variants were required to have a frequency of < 1/10000 in gnomAD databases (v2.1, v3.1) and most variants were not observed. Overall, we identified damaging de novo or recessive genotypes in 10.56% of the study cohort (Supplemental Tables 9, 10), in line with previous studies that utilized different methods of defining pathogenicity4,5,8,9. Damaging genetic variants were assigned to several functional gene pathways. Gene lists for gene pathways were obtained using the reactome pathway browser. Gene lists are shown in Supplemental Table 11 and have been previously described5,8,9 There is overlap between gene lists, with some genes represented in more than one gene pathway/category (Supplemental Figure 4).
Probabilistic graphical models
Probabilistic graphical models (PGMs) provide a robust explainable AI methodology capable of discovering and quantifying additive and synergistic effects amongst broad classes of variables. For the work presented here, we used a form of PGMs known as a Bayesian networks21. Bayesian networks are fully transparent, and their graphical representation offers an intuitive and visual mechanism for understanding the relationships between variables and the impacts of multiple variables on outcomes of interest22–29. Moreover, Bayesian networks offer practical advantages over regression approaches, by capturing the entire joint probability distribution of the data, encompassing all interrelationships among the variables incorporated in the non-linear model. Thus, a single network can be used to explore any combination of variables as a target outcome in one query and then as a risk factor for a different target outcome, all within the same model. For more on these points see 21,30,31.
Feature selection
Single variables, such as damaging genetic variants in chromatin-modifying genes, were tested for conditional dependency with phenotype variables using exact Bayesian Networks. Each conditional probability [e.g., the probability of LVO given a damaging de novo chromatin variant: P (LVO | chromatin dGV) was estimated as the median conditional probability from 1000 independent networks. Conditional probability estimates were divided by the baseline probability for each respective phenotype to obtain absolute risk ratios. Associations with absolute risk ratios ≥1.0 were selected for further analyses. Surgery-related variables associated with gene categories and CHD phenotypes were identified in a similar way. Each surgical feature was tested individually as a conditional variable with genetic variables and each of the five specific CHD phenotypes. All conditional variables, individually or as composite variables (e.g., mortality and ECA), were required to have at least six events.
Network construction
Bayesian networks were created for each CHD phenotype category. Each network included genetic and surgical variables identified in the feature selection stage. All-cause mortality was also included in each network. All input conditional variables were coded as presence/absence. A small number of missing surgical values (< 10% for any single variable) was imputed using a K-nearest neighbors approach (k = 10). The structure of each network was learned with the Silander-Myllymaki exact algorithm with Bayesian information criterion (BIC) scoring32 or by a greedy hill-climbing method for the large networks with more than 15 nodes. Posterior probabilities were network propagated using exact inference. Network structure learning and belief propagation were performed with the bnstruct and gRain R packages33,34. To improve convenience and functionality, we created the BayesNetExplorer.jl package which implements network structure learning and belief propagation methods and provides tools for feature selection, risk estimation, graphics, and other network analysis tasks.
Risk Calculations
We used two risk ratios (RR) to summarize our risk estimates, absolute and relative RR. For example, the absolute RR of a LVO phenotype given a damaging genotype in a chromatin-modifying gene is as follows: . The relative RR estimates the relative change in mortality risk for LVO patients with damaging mutations in chromatin-modifying genes, compared to similar patients without a damaging chromatin genotype:
. Final risk ratios and their confidence intervals are reported as the median and 95% confidence interval from an empirical distribution of risk ratio estimates. The empirical distributions are created by randomly resampling the data set with replacement and recreating 1000 independent networks and risk estimates. A Laplace correction (k = 1) or network smoothing value was used to prevent zero-state probability estimates during bootstrapping.
Results
Refining the genetic architecture of CHD
For genetic and outcomes analysis, the study population consisted of 2,253 PCGC probands (1992 trios, 12 duos, 245 singletons) with both exome sequencing and surgical outcomes data, classified into five CHD phenotype categories. The AI-based genome analysis tool GEM16 identified predicted damaging de novo genotypes in 238 participants (10.6% of the cohort). A total of 131 damaging de novo / dominant and 198 damaging recessive/biallelic genetic variants were discovered (Supplemental Tables 9, 10). There were 17 genes with damaging de novo variants in two or more patients. The most commonly recurrent de novo variants were in known CHD-related genes such as KMT2D (11), CHD7 (6), RAF1 (3), JAG1 (3), and TAB2 (3). Biallelic damaging genotypes were observed in multiple patients for several genes including DYNC2H1 (3), DNAH5 (3), LAMA2 (3), GDF1 (2), and IFT140 (2).
We discovered that CHD phenotype categories were enriched for damaging genetic variants in specific gene pathways/categories (Table 1). For example, the LVO class was enriched 1.61-fold (CI 1.41-1.81) for damaging de novo genotypes in chromatin-modifying genes, with this signal driven primarily by patients with hypoplastic left heart syndrome (HLHS), a subset of LVO where these genotypes were enriched 1.92-fold (CI 1.32-2.52). While previous studies implicated damaging chromatin-modifying gene variants in CHD cohorts at large4,5,8,9, our analyses here help to define the specific CHD subtypes most influenced by damaging variants in chromatin-modification genes. The LVO phenotype class was also enriched for de novo genotypes in WNT genes (2.13-fold, CI 1.86-2.40), signal transduction genes (1.51-fold, CI 1.01-2.01), and a curated list of genes known to cause CHD (1.22-fold, CI 1.11-1.33; see Supplemental Tables 9 and 10). Notably, damaging genotypes in these pathways were not enriched in HLHS patients, further underscoring the complex genetic landscape underlying CHD.
The HTX phenotype class was enriched for damaging recessive/biallelic variants in cilia-related genes (2.63-fold, CI 2.06-3.20) and showed proportionally higher enrichment in the subset of motile cilia genes modulated by FOXJ1 (6.89-fold, 3.30-10.36), findings consistent with previous reports5,8,35. The OTHER phenotype class was enriched for damaging de novo variants in chromatin-modifying genes (1.85-fold, CI 1.44-2.26) and the curated CHD genes (1.59-fold, CI 1.38-1.80) lists. We were underpowered to detect enrichment in damaging genotypes in the AVC and CTD phenotype classes.
Damaging genotypes impact surgical outcomes
To further explore the relationships between genetic and clinical variables, we utilized Bayesian networks, a powerful statistical framework that can model complex dependencies, including non-linear relationships and indirect associations, in a probabilistic manner. In Bayesian networks, variables are depicted as nodes in a graph and conditional dependencies between variables are represented by the edges connecting those nodes. Once a network is constructed, the impact of any combination variables on any selected outcome can be quantified, while controlling for the effects of other variables incorporated into the network. The networks describing the conditional dependencies between damaging genotypes, CHD phenotypes and post-operative variables are shown in Figure 1a, b.
a, An exact Bayesian network depicting the relationship among damaging de novo genetic variants in chromatin-modifying genes (green), phenotypes: LVO, HLHS, and ECAs (blue), surgical STAT4 or STAT5 category (red), and surgical outcomes (orange). b, An exact Bayesian network depicting the relationship among damaging recessive genetic variants in cilia genes (green), phenotypes: laterality defects (HTX) and extra cardiac anomalies (ECAs) (blue), surgical STAT4 category (yellow), and surgical outcomes (orange). Directed acyclic graphs were moralized and displayed as non-directional networks. c, Relative risk ratios for adverse post-operative outcomes and CHD phenotypes or surgical complexity, comparing probands with and without damaging genotypes. Empirical ninety-five percent confidence intervals (CI 5, 95) are based on 1000 resampled network-based probabilities (see Methods). Abbreviations: Chromatin dGV: de novo damaging genotypes in chromatin-modifying genes, LVO: left ventricular outflow tract obstruction, HLHS: hypoplastic left heart syndrome Cilia dGV: biallelic damaging genotypes in cilia-related genes, ECA: extra cardiac anomaly, HTX: heterotaxy/laterality defects, MORT: mortality, STAT4: surgical STAT4 category, STAT4-5: surgical STAT 4 or STAT5 category, VENT: post-operative ventilation time >7 days.
Damaging genotypes in chromatin-modifying and cilia-related genes (defined by a GEM16 score > 1.0) increased the probability of severe adverse clinical outcomes following congenital cardiac surgery, including mortality, cardiac arrest, and prolonged mechanical ventilation (> 7 days post-surgery). For example, damaging de novo chromatin genetic variants increased the probability (relative risk) of mortality 1.81-fold (CI 1.50-3.21), cardiac arrest 1.74-fold (CI 1.40-2.94) and prolonged ventilation 1.65-fold (CI 1.41-2.27). Likewise, damaging recessive/biallelic cilia genotypes increased the probability of mortality 1.40-fold (CI 1.09-2.10), cardiac arrest 1.50-fold (CI 1.13-2.34) and prolonged ventilation 1.43-fold (CI 1.09-1.96). Reciprocally, the absence of a damaging genotype was protective for these adverse post-operative outcomes. Thus, for a proband without a damaging de novo chromatin genotype, the relative risk ratio for mortality was 0.55 (CI 0.31-0.69), 0.58 (CI 0.34-0.72) for cardiac arrest and 0.61 (CI 0.44-0.72) for prolonged ventilation. For a proband without at damaging recessive/biallelic cilia genotype, the relative risk ratio for mortality was 0.72 (CI 0.48-0.91), 0.63 (CI 0.43-0.88) for cardiac arrest and 0.70 (CI 0.51-0.92) for prolonged ventilation.
Damaging genotypes impact surgical outcomes in the context of surgical mortality risk category
We discovered that damaging chromatin and cilia genotypes were associated with an increased risk of mortality for probands undergoing the highest risk surgical procedures (Figure 1c). Thus, probands who died after a STAT4 or STAT5 surgical procedure were 1.80-fold (CI 1.47-4.13) more likely to harbor a damaging chromatin variant. Similarly, those who died after a STAT4 surgery were 1.73-fold (CI 1.24-2.47) more likely to harbor a damaging recessive/biallelic cilia genotype (Figure 1). Damaging chromatin and cilia genotypes were overrepresented in probands experiencing cardiac arrest or prolonged mechanical ventilation following the most complex surgical procedures (Figure 1c).
Damaging genotypes impact surgical outcomes in the context of CHD phenotypes
More broadly, considering mortality in the context of CHD phenotypes, LVO patients who died were 2.26-fold (CI 1.22-2.94) more likely to harbor a damaging de novo chromatin genotype, while HTX patients who died were 2.80-fold (CI 2.45-3.15) more likely to harbor a damaging recessive/biallelic cilia genotype (Figure 1c). Similarly, damaging chromatin or cilia genotypes were overrepresented in probands with LVO, HLHS and HTX who experienced cardiac arrest or prolonged post-operative ventilation (Figure 1c). Specifically, HTX patients who arrested post-operatively were 3.34-fold (CI 1.44-5.49) more likely to harbor a damaging recessive/biallelic cilia genotype, compared to similar patients without a damaging cilia genotype. Collectively, these findings demonstrate that genome sequencing data are critical for predicting severe post-operative events in the context of specific CHD phenotypes and the highest risk congenital heart surgeries.
Damaging genotypes impact surgical outcomes in the context of extracardiac phenotypes
Given the recognized impact of ECAs on outcomes following congenital cardiac surgery10,14,36, we also explored the relationship between ECAs and adverse post-operative outcomes in the context of genotypes and CHD phenotypes. ECAs increased the probability (relative risk) of mortality 2.85-fold (CI 1.47-2.86) and prolonged ventilation 1.72-fold (CI 1.63-1.72) following congenital cardiac surgery. Consistent with previous findings10, damaging de novo genetic variants, including de novo variants in chromatin-modifying genes, were enriched in probands with ECAs 1.47-fold (CI 1.44-1.48) and 2.09-fold (CI 2.08-2.11), respectively. By contrast, damaging recessive/biallelic cilia genotypes were not enriched in probands with ECAs (0.84-fold; CI 0.85-1.13).
We also examined reciprocal effects, i.e., impact of predicted damaging genotypes on ECAs and adverse outcomes. For example, probands with damaging de novo chromatin genotypes identified by GEM were 2.49-fold (CI 2.17-4.99) more likely to have an ECA and die, compared to probands without a damaging chromatin genotype, and 2.44-fold (CI 1.97-3.41) more likely to have an ECA and prolonged ventilation (Figure 2). Moreover, a damaging recessive/biallelic cilia genotype identified by GEM increased the probability of mortality in probands with an ECA 1.48-fold (CI 1.02-2.85), compared to similar probands without a damaging cilia genotype, and increased the probability of prolonged ventilation in the presence of an ECA 1.52-fold (CI 1.06-2.51). Additionally, a damaging cilia genotype increased the probability of prolonged ventilation in HTX patients with an ECA 4.01-fold (CI 1.67-10.6), compared to similar patients without a damaging cilia genotype (Figure 2). Taken together, these findings demonstrate that damaging genotypes in chromatin and cilia genes predict severe post-operative events in the setting of ECAs.
Relative risk ratios for adverse post-operative outcomes and extracardiac anomalies (ECAs), comparing probands with and without damaging genotypes in chromatin-modifying or cilia-related genes. The \\ symbol represents CI95 that exceeds the x-axis range.
The number of probands experiencing adverse outcomes and harboring damaging gene pathway variants in the AVC, CTD, and OTHER categories was too low to warrant generation of Bayesian networks for outcomes prediction in these CHD phenotypes. Consequently, larger cohorts are necessary to adequately predict the impact of genetics on outcomes for these CHD phenotypes. However, damaging genotypes in a number of gene pathways/categories, such as FOXJ1-controlled genes, high murine heart expression genes (HHE), WNT signaling genes, NOTCH signaling genes, and genes in a curated CHD gene list were predictive of mortality for the most complex surgical categories (Figure 3). Damaging genotypes in signal transduction and TGF-β pathways were not predictive of mortality. Taken together, these findings highlight the value of genomic data for predicting adverse outcomes following congenital cardiac surgery, especially in the context of CHD phenotypes, ECAs and surgical complexity.
Relative risk ratios for adverse post-operative outcomes and surgical complexity, comparing probands with and without damaging genotypes in various gene pathways or categories. Gene lists are described in Supplemental Table 11 and have been previously published.5,8,9 There is overlap between gene lists, with some genes represented in more than one gene pathway/category (Supplemental Figure 4).
Discussion
Assessing the impact of genetics on patient outcomes in CHD is complicated by the intrinsic severity of the cardiac lesion, the complex medical and surgical interventions necessary for survival, and the high degree of phenotypic, locus and allelic heterogeneity. The NHBLI-funded PCGC is one of the world’s largest collections of genetic, phenotypic, and clinical variables for CHD and thus provides an excellent resource for exploring the utility of genomics data for outcomes prediction. In this study, we implemented an explainable AI-based analysis framework to automatically classify CHD patients into phenotype categories and identify damaging genetic variants and genotypes. This approach allowed us to explore how damaging genotypes impact outcomes following congenital cardiac surgery, in the context of specific CHD phenotypes, ECAs, and surgical complexity, providing precise risk estimates for specific clinical contexts.
De novo variants associated with CHD have been shown to be enriched in genes related to chromatin regulation4,5,8,9. Our results identify LVO lesions and confirm HLHS as a principal driver of the chromatin signal in this cohort. HLHS is one of the most severe forms of CHD and associated with substantial morbidity and mortality. Our results show that the subset of HLHS patients with damaging genetic variants in chromatin genes has even greater risk (up to 2.57-fold) for severe post-operative outcomes in the context of the most complex surgical procedures.
Our findings also reinforce previous studies showing that damaging recessive/biallelic genotypes in cilia-related genes are overrepresented in the heterotaxy/laterality phenotype category5,8. Our results here demonstrate the additional utility of genetic findings for outcomes predictions. Damaging recessive/biallelic cilia genotypes increase the risk of severe adverse post-operative outcomes in the context of surgical complexity, HTX phenotype and the presence of an ECA. For example, damaging recessive/biallelic cilia genotype substantially increase (4.01-fold) the risk of prolonged ventilation for HTX patients with an ECA. These findings are consistent with an emerging body of literature implicating cilia dysfunction, HTX, and respiratory complications following congenital cardiac surgery37,38.
Established and emerging literature has highlighted the impact of genetics on mortality and other adverse outcomes following congenital cardiac surgery, mostly focusing on the impact of copy number variants.10–14 Damaging de novo genic variants were associated with worse transplant-free survival and longer times to final extubation in a previously reported subset of the PCGC cohort (n = 1268)10. Here, we expand upon these findings in the largest study to date relating genotypes to CHD surgical outcomes. Our analyses reveal that damaging genotypes in specific gene pathways/categories impact post-operative outcomes across CHD phenotypic categories in specific and quantifiable ways.
Our AI approach allowed us to unravel the conditional dependencies among diverse clinical and genetic variables and to discover their impacts, either in isolation or in combination, on post-operative outcomes. These findings define a critical role for genome sequencing in outcomes prediction for congenital cardiac surgeries, especially in the context of higher risk surgical procedures, specific CHD phenotypes and ECAs. Importantly, the absence of damaging genotypes was protective for adverse outcomes following congenital cardiac surgery. Thus, genomic information is informative whether or not a proband has an identified damaging genotype.
Nevertheless, there are limitations inherent to this study. For example, the PCGC population is not an inception cohort and thus is likely depleted for genetic lesions that predispose to early death, meaning our morbidity estimates are likely lower bounds. Although the PCGC cohort reflects a broad spectrum of CHD, recruitment of severe CHD forms was favored, leaving us under-powered to investigate the impact of genome sequencing for less severe CHD phenotypes. Additionally, while large clinical registries, such as the STS database, are invaluable resources for outcomes research, these databases, despite the inclusion of auditing features, may suffer from data quality issues, variability in the abstraction of data, batch effects, and missing data39–41 that might impact the interpretation of the results presented here. Finally, we do not yet have access to an independent validation cohort with genomic data and similar clinical variables.
Looking to the future, a more complete description of the genetic and outcomes landscape of CHD could be enabled through clinical genome sequencing of CHD patients at even greater scales, together with initiatives by major consortia to collect and distribute genomic and clinical data more broadly. Given the rapid decline in costs, the increasing availability and quick turn-a-round time, genome sequencing is now poised to become the standard of care for all critically ill newborns.42,43 Our findings make it clear that genome sequencing of all newborns with complex CHD will empower personalized risk-stratification for outcomes following congenital cardiac surgery.
Data availability
Genetic and phenotypic data used in the paper are provided in the supplementary tables. Exome sequencing data have been deposited in the database of Genotypes and Phenotypes (dbGaP) under accession numbers phs000571.v1.p1, phs000571.v2.p1 and phs000571.v3.p2.
Online resources
https://heartsmart.pcgcid.org/
https://github.com/dmlc/XGBoost.jl
Author contributions
W.S.W., E.J.H., M.Y., and M.T.F. conceived and planned the project; M.T.F. and W.S.W. analyzed the data and created the manuscript; E.J.H., R.Z., advised the network analysis and presentation; T.A.M. provided clinical interpretation and analysis support; E.F. contributed to the GEM implementation; M.T.F. and M.Y. oversaw all aspects of the network and statistical analyses; M.W. provided network access to data; W.S.W., E.J.H., T.A.M., R.Z., D.B., M.B., W.K.C., J.W.G., B.D.G., E.G., P.G., J.W.N., S.U.M., A.E.R., C.E.S., J.G.S.,Y.S., H.J.Y., N.R.B., E.R.G., M.T.B, J.E.M., M.Y., and M.T.F. edited the final manuscript.
Competing interests
The authors declare the following competing interests: M.Y.--GEM commercialization through Fabric Genomics, Inc; E.F. is an employee of Fabric Genomics.
Acknowledgments
We thank the patients and their families for participating in the Pediatric Cardiac Genomics Consortium and Bench to Bassinet research programs. This research would not be possible without the clinical professionals involved in patient recruitment, and we thank those at the following institutions including the Columbia Medical School: D. Awad, C. Breton, K. Celia, C. Duarte, D. Etwaru, N. Fishman, E. Griffin, M. Kaspakoval, J. Kline, R. Korsin, A. Lanz, E. Marquez, D. Queen, A. Rodriguez, J. Rose, J.K. Sond, D. Warburton, A. Wilpers, and R. Yee; 2) the Children’s Hospital of Los Angeles: J. Ellashek and N. Tran; 3) the Children’s Hospital of Philadelphia: S. Edman, J. Garbarini, J. Tusi, and S. Woyciechowski; 4) the Harvard Medical School: J. Geva and M. Borensztein; 5) the Icahn School of Medicine at Mount Sinai: A. Julian, M. Mac Neal, Y. Mendez, T. Mendiz-Ramdeen, and C. Mintz; 6) the University College London: B. McDonough, K. Flack, L. Panesar, and N. Taylor; 7) the University of Rochester School of Medicine and Dentistry: E. Taillie; and 8) the Yale School of Medicine: N. Cross. We thank Nick Felicelli, Prakash Velayutham, and the staff at the Cincinnati Children’s Hospital Medical Center for computational support and for providing access to the HeartsMart database. We thank Carson Holt, Shawn Rynearson, Barry Moore at the Utah Center for Genomic Discovery and the staff at the Utah Center for High Performance Computing for high-throughput processing of patient sequence data.
The Pediatric Cardiac Genomics Consortium (PCGC) program is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, U.S. Department of Health and Human Services through grants UM1HL128711, UM1HL098162, UM1HL098147, UM1HL098123, UM1HL128761, U01-HL098153, U01-HL098163, and U01HL131003. This manuscript was prepared in collaboration with investigators of the PCGC and has been reviewed and/or approved by the PCGC. PCGC investigators are listed at [https://benchtobassinet.com/?page_id=119]