Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

An Explainable Host Genetic Severity Predictor Model for COVID-19 Patients

View ORCID ProfileAnthony Onoja, Francesco Raimondi, Mirco Nanni
doi: https://doi.org/10.1101/2023.03.06.23286869
Anthony Onoja
1Bioinformatics Lab, Faculty of Sciences, Scuola Normale Superiore di-Pisa, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Anthony Onoja
  • For correspondence: a.onoja{at}surrey.ac.uk
Francesco Raimondi
2Bioinformatics Lab, Faculty of Sciences, Scuola Normale Superiore di-Pisa, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mirco Nanni
3KDD Laboratory, ISTI-National Research Council of Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Understanding the COVID-19 severity and why it differs significantly among patients is a thing of concern to the scientific community. The major contribution of this study arises from the use of a voting ensemble host genetic severity predictor (HGSP) model we developed by combining several state-of-the-art machine learning algorithms (decision tree-based models: Random Forest and XGBoost classifiers). These models were trained using a genetic Whole Exome Sequencing (WES) dataset and clinical covariates (age and gender) formulated from a 5-fold stratified cross-validation computational strategy to randomly split the dataset to overcome model instability. Our study validated the HGSP model based on the 18 features (i.e., 16 identified candidate genetic variants and 2 covariates) identified from a prior study. We provided post-hoc model explanations through the ExplainerDashboard - an open-source python library framework, allowing for deeper insight into the prediction results. We applied the Enrichr and OpenTarget genetics bioinformatic interactive tools to associate the genetic variants for plausible biological insights, and domain interpretations such as pathways, ontologies, and disease/drugs. Through an unsupervised clustering of the SHAP feature importance values, we visualized the complex genetic mechanisms. Our findings show that while age and gender mainly influence COVID-19 severity, a specific group of patients experiences severity due to complex genetic interactions.

Competing Interest Statement

The authors have declared no competing interest.

Clinical Protocols

https://github.com/raimondilab/COVID-19-severity-host-genetic-predictor-model-explanation

Funding Statement

Intesa San Paolo for the 2020 charity fund dedicated to the project N B/2020/0119 Identificazione delle basi genetiche determinanti la variabilita clinica della risposta a COVID-19 nella popolazione italiana. The EU project H2020-SC1-FA-DTS-2018-2020, entitled International consortium for integrative genomics prediction (INTERVENE) - Grant Agreement No. 101016775.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The dataset used for this study was part of the GEN-COVID Multicenter Study, https://sites.google.com/dbm.unisi.it/gen-COVID. The Italian multicenter study aimed at identifying the COVID-19 host genetic bases. Specimens were provided by the COVID-19 Biobank of Siena, which is part of the Genetic Biobank of Siena, a member of BBMRI-IT, of Telethon Network of GeneticBiobanks (project no. GTB18001), of EuroBioBank, and RD-Connect. Further information on the cleansed dataset and codes are available on our Githhub group page at: https://github.com/raimondilab/COVID-19-severity-host-genetic-predictor-model-explanation

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

  • Some additional clarifications were made in the abstract and also figure (7 instead of 1 in the last section) numbering was updated.

Data Availability

The cleansed dataset and codes are available on our Githhub group page at: https://github.com/raimondilab/COVID-19-severity-host-genetic-predictor-model-explanation

https://github.com/raimondilab/COVID-19-severity-host-genetic-predictor-model-explanation

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
Back to top
PreviousNext
Posted March 09, 2023.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
An Explainable Host Genetic Severity Predictor Model for COVID-19 Patients
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
An Explainable Host Genetic Severity Predictor Model for COVID-19 Patients
Anthony Onoja, Francesco Raimondi, Mirco Nanni
medRxiv 2023.03.06.23286869; doi: https://doi.org/10.1101/2023.03.06.23286869
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
An Explainable Host Genetic Severity Predictor Model for COVID-19 Patients
Anthony Onoja, Francesco Raimondi, Mirco Nanni
medRxiv 2023.03.06.23286869; doi: https://doi.org/10.1101/2023.03.06.23286869

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)