Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Mutation Pathogenicity Prediction by a Biology Based Explainable AI Multi-Modal Algorithm

Raizy Kellerman, View ORCID ProfileOmri Nayshool, View ORCID ProfileOrtal Barel, Sharon Paz, View ORCID ProfileNinette Amariglio, View ORCID ProfileEyal Klang, View ORCID ProfileGideon Rechavi
doi: https://doi.org/10.1101/2024.06.05.24308476
Raizy Kellerman
1Sheba Cancer Research Center, Wohl Institute of Translational Medicine, Sheba Medical Center, Tel Hashomer, Israel
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Omri Nayshool
1Sheba Cancer Research Center, Wohl Institute of Translational Medicine, Sheba Medical Center, Tel Hashomer, Israel
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Omri Nayshool
Ortal Barel
1Sheba Cancer Research Center, Wohl Institute of Translational Medicine, Sheba Medical Center, Tel Hashomer, Israel
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ortal Barel
Sharon Paz
1Sheba Cancer Research Center, Wohl Institute of Translational Medicine, Sheba Medical Center, Tel Hashomer, Israel
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ninette Amariglio
1Sheba Cancer Research Center, Wohl Institute of Translational Medicine, Sheba Medical Center, Tel Hashomer, Israel
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ninette Amariglio
Eyal Klang
2Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Eyal Klang
Gideon Rechavi
1Sheba Cancer Research Center, Wohl Institute of Translational Medicine, Sheba Medical Center, Tel Hashomer, Israel
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Gideon Rechavi
  • For correspondence: gidi.rechavi{at}sheba.health.gov.il
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Most known pathogenic mutations occur in protein-coding regions of DNA and change the way proteins are made. Deciphering the protein structure therefore provides great insight into the molecular mechanisms underlying biological functions in human disease. While there have recently been major advances in the artificial intelligence-based prediction of protein structure, the determination of the biological and clinical relevance of specific mutations is not yet up to clinical standards. This challenge is of utmost medical importance when decisions, as critical as suggesting termination of pregnancy or recommending cancer-directed rational drugs, depend on the accuracy of prediction of the effect of the specific mutation. Currently, available tools are aiming to characterize the effect of a mutation on the functionality of the protein according to biochemical criteria, independent of the biological context. A specific change in protein structure can result either in loss of function (LOF) or gain-of-function (GOF) and the ability to identify the directionality of effect needs to be taken into consideration when interpreting the biological outcome of the mutation. Here we describe Triple-modalities Variant Interpretation and Analysis (TriVIAI), a tool incorporating three complementing modalities for improved prediction of missense mutations pathogenicity: protein language model (pLM), graph neural network (GNN) and a tabular model incorporating physical properties from the protein structure. The TriVIAl ensemble’s predictions compare favorably with the existing tools across various metrics, achieving an AUC-ROC of 0.887, a precision-recall curve (PRC) score of 0.68, and a Brier score of 0.16. The TriVIAI ensemble is also endowed with two major advantages compared to other available tools. The first is the incorporation of biological insights which allow to differentiate between GOF mutations that tend to cluster in specific hotspots and affect structure in a specific functional way versus LOF mutations that are usually dispersed and can cripple the protein in a variety of different ways. Importantly, the advantage over other available tools is more noticeable with GOF mutations as their effect on the protein structure is less disruptive and can be misinterpreted by current variant prioritization strategies. Until now available AI-based pathogenicity predicting algorithms were a black box for the users. The second significant advantage of TriVIAI is the explainability of the ensemble which contrasts the other available AI-based pathogenicity predicting algorithms which constitute a black box for the users. This explainability feature is of major importance considering the clinical responsibility of the medical decision-makers using AI-based pathogenicity predictors.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This work was supported by the infrastructure grant of the Israel Innovation Authority, Israel Ministry of Health and the National Headquarter "Digital Israel". The authors thank the Kahn Family Foundation for the continuous support of their research. G.R. is supported by the Flight Attendant Medical Research Institute (FAMRI) and by a grant from the Varda and Boaz Dotan Research Center in12 Hemato-Oncology, Tel Aviv University. G.R. holds the Djerassi Chair in Oncology at the Tel Aviv University.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The study used (or will use) ONLY openly available human data that were originally located at HGMD, ClinVar and Cosmic

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

Public datasets including ClinVar, COSMIC, and dbNSFP4.4a, as well as the AlphaFold Protein Structure Database, are freely accessible for download and use. The test datasets were sourced from Zhang et. al.38 and are available in Supplementary Table 2. The LOF and GOF annotations taken from the GOF/LOF database,48 are also freely available. HGMD Pro 2022.2 is available only under a paid license.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted June 05, 2024.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Mutation Pathogenicity Prediction by a Biology Based Explainable AI Multi-Modal Algorithm
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Mutation Pathogenicity Prediction by a Biology Based Explainable AI Multi-Modal Algorithm
Raizy Kellerman, Omri Nayshool, Ortal Barel, Sharon Paz, Ninette Amariglio, Eyal Klang, Gideon Rechavi
medRxiv 2024.06.05.24308476; doi: https://doi.org/10.1101/2024.06.05.24308476
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Mutation Pathogenicity Prediction by a Biology Based Explainable AI Multi-Modal Algorithm
Raizy Kellerman, Omri Nayshool, Ortal Barel, Sharon Paz, Ninette Amariglio, Eyal Klang, Gideon Rechavi
medRxiv 2024.06.05.24308476; doi: https://doi.org/10.1101/2024.06.05.24308476

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetic and Genomic Medicine
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)