Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

A variant prioritization tool leveraging multiple instance learning for rare Mendelian disease genomic testing

View ORCID ProfileHo Heon Kim, Ju Yeop Baek, Heonjong Han, View ORCID ProfileWon Chan Jeong, Dong-Wook Kim, Kisang Kwon, View ORCID ProfileYongjun Song, Hane Lee, Go Hun Seo, Jungsul Lee, View ORCID ProfileKyoungyeul Lee
doi: https://doi.org/10.1101/2024.04.18.24305632
Ho Heon Kim
1Research and Development Center, 3billion, Inc., 416 Teheran-ro, 06193 Seoul, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ho Heon Kim
Ju Yeop Baek
1Research and Development Center, 3billion, Inc., 416 Teheran-ro, 06193 Seoul, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Heonjong Han
1Research and Development Center, 3billion, Inc., 416 Teheran-ro, 06193 Seoul, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Won Chan Jeong
1Research and Development Center, 3billion, Inc., 416 Teheran-ro, 06193 Seoul, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Won Chan Jeong
Dong-Wook Kim
1Research and Development Center, 3billion, Inc., 416 Teheran-ro, 06193 Seoul, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kisang Kwon
1Research and Development Center, 3billion, Inc., 416 Teheran-ro, 06193 Seoul, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yongjun Song
1Research and Development Center, 3billion, Inc., 416 Teheran-ro, 06193 Seoul, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yongjun Song
Hane Lee
1Research and Development Center, 3billion, Inc., 416 Teheran-ro, 06193 Seoul, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Go Hun Seo
1Research and Development Center, 3billion, Inc., 416 Teheran-ro, 06193 Seoul, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jungsul Lee
1Research and Development Center, 3billion, Inc., 416 Teheran-ro, 06193 Seoul, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kyoungyeul Lee
1Research and Development Center, 3billion, Inc., 416 Teheran-ro, 06193 Seoul, Republic of Korea
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kyoungyeul Lee
  • For correspondence: ney8909{at}gmail.com
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Background Genomic testing such as exome sequencing and genome sequencing is being widely utilized for diagnosing rare Mendelian disorders. Because of a large number of variants identified by these tests, interpreting the final list of variants and identifying the disease-causing variant even after filtering out likely benign variants could be labor-intensive and time-consuming. It becomes even more burdensome when various variant types such as structural variants need to be considered simultaneously with small variants. One way to accelerate the interpretation process is to have all variants accurately prioritized so that the most likely diagnostic variant(s) are clearly distinguished from the rest.

Methods To comprehensively predict the genomic test results, we developed a deep learning based variant prioritization system that leverages multiple instance learning and feeds multiple variant types for variant prioritization. We additionally adopted learning to rank (LTR) for optimal prioritization. We retrospectively developed and validated the model with 5-fold cross-validation in 23,115 patients with suspected rare diseases who underwent whole exome sequencing. Furthermore, we conducted the ablation test to confirm the effectiveness of LTR and the importance of permutational features for model interpretation. We also compared the prioritization performance to publicly available variant prioritization tools.

Results The model showed an average AUROC of 0.92 for the genomic test results. Further, the model had a hit rate of 96.8% at 5 when prioritizing single nucleotide variants (SNVs)/small insertions and deletions (INDELs) and copy number variants (CNVs) together, and a hit rate of 95.0% at 5 when prioritizing CNVs alone. Our model outperformed publicly available variant prioritization tools for SNV/INDEL only. In addition, the ablation test showed that the model using LTR significantly outperformed the baseline model that does not use LTR in variant prioritization (p=0.007).

Conclusion A deep learning model leveraging multiple instance learning precisely predicted genetic testing conclusion while prioritizing multiple types of variants. This model is expected to accelerate the variant interpretation process in finding the disease-causing variants more quickly for rare genetic diseases.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This work was supported by the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (2022-0-00333, Multi-faceted analysis of pediatric rare disease Al integrated SW solution development)

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The study was approved by the institutional review board at Korea National Institute for Bioethics Policy (P01-202308-02-001). Informed consent was determined unnecessary with the study only involving anonymous and de-identified retrospective data.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

The model and experimental data are available at https://github.com/4pygmalion/ASC3.

https://github.com/4pygmalion/ASC3

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted April 19, 2024.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
A variant prioritization tool leveraging multiple instance learning for rare Mendelian disease genomic testing
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
A variant prioritization tool leveraging multiple instance learning for rare Mendelian disease genomic testing
Ho Heon Kim, Ju Yeop Baek, Heonjong Han, Won Chan Jeong, Dong-Wook Kim, Kisang Kwon, Yongjun Song, Hane Lee, Go Hun Seo, Jungsul Lee, Kyoungyeul Lee
medRxiv 2024.04.18.24305632; doi: https://doi.org/10.1101/2024.04.18.24305632
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
A variant prioritization tool leveraging multiple instance learning for rare Mendelian disease genomic testing
Ho Heon Kim, Ju Yeop Baek, Heonjong Han, Won Chan Jeong, Dong-Wook Kim, Kisang Kwon, Yongjun Song, Hane Lee, Go Hun Seo, Jungsul Lee, Kyoungyeul Lee
medRxiv 2024.04.18.24305632; doi: https://doi.org/10.1101/2024.04.18.24305632

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetic and Genomic Medicine
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)