Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Improving Genetic Association Studies with a Novel Methodology that Unveils the Hidden Complexity of All-Cause Heart Failure

John T. Gregg, Blanca E. Himes, Folkert W. Asselbergs, Jason H. Moore
doi: https://doi.org/10.1101/2023.08.02.23293567
John T. Gregg
1Department of Biostatistics Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Blanca E. Himes
1Department of Biostatistics Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Folkert W. Asselbergs
2Institute of Cardiovascular Science, University College London, London, England
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jason H. Moore
3Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: Jason.Moore{at}csmc.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Motivation Genome-Wide Association Studies (GWAS) commonly assume phenotypic and genetic homogeneity that is not present in complex conditions. We designed Transformative Regression Analysis of Combined Effects (TRACE), a GWAS methodology that better accounts for clinical phenotype heterogeneity and identifies gene-by-environment (GxE) interactions. We demonstrated with UK Biobank (UKB) data that TRACE increased the variance explained in All-Cause Heart Failure (AHF) via the discovery of novel single nucleotide polymorphism (SNP) and SNP-by-environment (i.e. GxE) interaction associations. First, we transformed 312 AHF-related ICD10 codes (including AHF) into continuous low-dimensional features (i.e., latent phenotypes) for a more nuanced disease representation. Then, we ran a standard GWAS on our latent phenotypes to discover main effects and identified GxE interactions with target encoding. Genes near associated SNPs subsequently underwent enrichment analysis to explore potential functional mechanisms underlying associations. Latent phenotypes were regressed against their SNP hits and the estimated latent phenotype values were used to measure the amount of AHF variance explained.

Results Our method identified over 100 main GWAS effects that were consistent with prior studies and hundreds of novel gene-by-smoking interactions, which collectively accounted for approximately 10% of AHF variance. This represents an improvement over traditional GWAS whose results account for a negligible proportion of AHF variance. Enrichment analyses suggested that hundreds of miRNAs mediated the SNP effect on various AHF-related biological pathways. The TRACE framework can be applied to decode the genetics of other complex diseases.

Availability All code is available at https://github.com/EpistasisLab/latent_phenotype_project

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This work was funded by NIH grants LM010098, AG066833, and 5T32HG000046

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The North West Multi-centre Research Ethics Committee (MREC) of the UK Biobank gave ethical approval for this work. All UK Biobank participants consented to their data's use for research purposes, and our study was conducted in compliance with the UK Biobank's guidelines. While our study did not require additional ethics approval, we ensured that all practices followed the ethical guidelines provided by the UK Biobank. All data used in our study are anonymized, and no individual-level data are reported in a way that could lead to the identification of participants. Therefore, specific consent to publish from individual participants is not applicable. Please refer to the UK Biobank's website for further details.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data availability

The input data are available for other researchers via the UKB’s controlled access scheme [69]. The procedure to apply for access [70] requires registering with the UK Biobank and compiling an application form detailing:

  • A summary of the planned research

  • The UK Biobank data fields required for the project

  • A description of derivatives (data, variables) generated by the project

In addition, several publicly available bio-informatics tools with associated databases were used in this study:

  • Genevestigator: We used this to compare genes containing SNP hits to the genes’ DE in response to cigarette smoke. It can be accessed at https://genevestigator.com/.

  • LDTrait tool: We used this to find established GWAS SNP hits in LD with those of our study. It can be accessed at https://ldlink.nih.gov/?tab=ldtrait.

  • DisGeNET: This public platform’s gda-scores were used to quantify the evidence that genes containing some of our SNP hits are related to cardiovascular disease. It is available at https://www.disgenet.org/search.

  • FUMA: We used this to select all genes within 300kb of our SNP hits. It is available at https://fuma.ctglab.nl/.

  • MSigDB: We used this for our enrichment analyses. It can be accessed at https://www.gsea-msigdb.org/gsea/msigdb/index.jsp

  • miEAA: We used this for our miRNA enrichment analysis. It is available at https://ccb-compute2.cs.uni-saarland.de/mieaa.

Access to these online resources is publicly available, but specific usage may require user registration. Please refer to each resource’s respective website for details on access, data use policies, and terms of service.

  • List of abbreviations

    AHF
    All-Cause Heart Failure
    AHD
    atherosclerotic heart disease
    CHD
    coronary heart disease
    CI
    Confidence Interval
    DE
    Differential Expression
    GBC
    Gradient Boosting Classification
    GO
    Gene Ontology
    GWAS
    Genome-Wide Association Studies
    GxE
    gene-by-environment
    ICC
    Intraclass Correlation Coefficient
    KB
    kilobases
    LD
    Linkage Disequilibrium
    LR
    Logistic Regression
    MICE
    Multivariate Imputation by Chained Equations
    PCA
    Principal Component Analysis
    SNP
    single nucleotide polymorphism
    TRACE
    Transformative Regression Analysis of Combined Effects
    UKB
    United Kingdom Biobank
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
    Back to top
    PreviousNext
    Posted August 04, 2023.
    Download PDF

    Supplementary Material

    Data/Code
    Email

    Thank you for your interest in spreading the word about medRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    Improving Genetic Association Studies with a Novel Methodology that Unveils the Hidden Complexity of All-Cause Heart Failure
    (Your Name) has forwarded a page to you from medRxiv
    (Your Name) thought you would like to see this page from the medRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    Improving Genetic Association Studies with a Novel Methodology that Unveils the Hidden Complexity of All-Cause Heart Failure
    John T. Gregg, Blanca E. Himes, Folkert W. Asselbergs, Jason H. Moore
    medRxiv 2023.08.02.23293567; doi: https://doi.org/10.1101/2023.08.02.23293567
    Twitter logo Facebook logo LinkedIn logo Mendeley logo
    Citation Tools
    Improving Genetic Association Studies with a Novel Methodology that Unveils the Hidden Complexity of All-Cause Heart Failure
    John T. Gregg, Blanca E. Himes, Folkert W. Asselbergs, Jason H. Moore
    medRxiv 2023.08.02.23293567; doi: https://doi.org/10.1101/2023.08.02.23293567

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Genetic and Genomic Medicine
    Subject Areas
    All Articles
    • Addiction Medicine (349)
    • Allergy and Immunology (668)
    • Allergy and Immunology (668)
    • Anesthesia (181)
    • Cardiovascular Medicine (2648)
    • Dentistry and Oral Medicine (316)
    • Dermatology (223)
    • Emergency Medicine (399)
    • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
    • Epidemiology (12228)
    • Forensic Medicine (10)
    • Gastroenterology (759)
    • Genetic and Genomic Medicine (4103)
    • Geriatric Medicine (387)
    • Health Economics (680)
    • Health Informatics (2657)
    • Health Policy (1005)
    • Health Systems and Quality Improvement (985)
    • Hematology (363)
    • HIV/AIDS (851)
    • Infectious Diseases (except HIV/AIDS) (13695)
    • Intensive Care and Critical Care Medicine (797)
    • Medical Education (399)
    • Medical Ethics (109)
    • Nephrology (436)
    • Neurology (3882)
    • Nursing (209)
    • Nutrition (577)
    • Obstetrics and Gynecology (739)
    • Occupational and Environmental Health (695)
    • Oncology (2030)
    • Ophthalmology (585)
    • Orthopedics (240)
    • Otolaryngology (306)
    • Pain Medicine (250)
    • Palliative Medicine (75)
    • Pathology (473)
    • Pediatrics (1115)
    • Pharmacology and Therapeutics (466)
    • Primary Care Research (452)
    • Psychiatry and Clinical Psychology (3432)
    • Public and Global Health (6527)
    • Radiology and Imaging (1403)
    • Rehabilitation Medicine and Physical Therapy (814)
    • Respiratory Medicine (871)
    • Rheumatology (409)
    • Sexual and Reproductive Health (410)
    • Sports Medicine (342)
    • Surgery (448)
    • Toxicology (53)
    • Transplantation (185)
    • Urology (165)