ABSTRACT
The electrocardiogram (ECG) is an inexpensive and widely available diagnostic tool, and therefore has great potential to facilitate disease detection in large-scale populations. Both cardiac and noncardiac diseases may alter the appearance of the ECG, though the extent to which diseases across the human phenotypic landscape can be detected on the ECG remains unclear. We developed a deep learning variational autoencoder model that encodes and reconstructs ECG waveform data within a multidimensional latent space. We then systematically evaluated whether associations between ECG encodings and a broad range of disease phenotypes could be detected using the latent space model by deriving disease vectors and projecting individual ECG encodings onto the vectors. We developed models for both 12- and single-lead ECGs, akin to those used in wearable ECG technology. We leveraged phecodes to generate disease labels using International Classification of Disease (ICD) codes for about 1,600 phenotypes in three different datasets linked to electronic health record data. We tested associations between ECG encodings and disease phenotypes using a phenome-wide association study approach in each dataset, and meta-analyzed the results. We observed that the latent space ECG model identified associations for 645 (40%) diseases tested in the 12-lead model. Associations were enriched for diseases of the circulatory (n=140, 82% of category-specific diseases), respiratory (n=53, 62%), and endocrine/metabolic (n=73, 45%) systems, with additional associations evident across the human phenome; results were similar for the single-lead models. The top ECG latent space association was with hypertension in the 12-lead ECG model, and cardiomyopathy in the single-lead ECG model (p<2.2×10-308 for each). The ECG latent space model demonstrated a greater number of associations than ECG models using standard ECG intervals alone, and generally resulted in improvements in discrimination of diseases compared to models comprising only age, sex, and race. We further demonstrate how a latent space model can be used to generate disease-specific ECG waveforms and facilitate disease profiling for individual patients.
Competing Interest Statement
Dr. Lubitz has received sponsored research support from Bristol Myers Squibb, Pfizer, Boehringer Ingelheim, Fitbit, Medtronic, Premier, and IBM, and has consulted for Bristol Myers Squibb, Pfizer, Blackstone Life Sciences, and Invitae. Dr. Anderson receives sponsored research support from Bayer AG and Massachusetts General Hospital and has consulted for ApoPharma. Dr. Weng receives sponsored research support from IBM to the Broad Institute. Dr. Ellinor has received sponsored research support from Bayer AG and IBM Health, and he has consulted for Bayer AG, Novartis and MyoKardia. Dr. Batra, Dr. Reeder and Dr. Friedman have received sponsored research support from Bayer AG and IBM Health.
Funding Statement
Dr. Lubitz is a full-time employee of Novartis Institutes for Biomedical Research as of July 18, 2022. Dr. Lubitz previously received support from NIH grants R01HL139731 and R01HL157635, and American Heart Association 18SFRN34250007. Dr. Anderson is supported by NIH grants R01NS103924 and U01NS069763 and American Heart Association grants 18SFRN34250007 and 21SFRN812095. Dr. Weng is supported by National Institutes of Health (NIH) grant 1R01HL139731. Dr. Choi is supported by the NHLBI BioData Catalyst Fellows program. Dr. Ellinor is supported by the NIH (1R01HL092577, K24HL105780), AHA (18SFRN34110082) and by MAESTRIA (965286). Dr. Lau is supported by the American Heart Association (853922).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Use of Mass General Brigham (MGB) and UK Biobank (application 7089) data were approved by the MGB Institutional Review Board.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
The Mass General Brigham source data are not publicly available because they are electronic health records. Making the data publicly available without additional consent or ethical approval could compromise privacy. Source data from the UK Biobank are available to qualified investigators via application at https://www.ukbiobank.ac.uk.