Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

CNVscore calculates pathogenicity scores for copy number variants together with uncertainty estimates accounting for learning biases in reference Mendelian disorder datasets

Francisco Requena, David Salgado, Valérie Malan, Damien Sanlaville, Frédéric Bilan, Christophe Béroud, View ORCID ProfileAntonio Rausell
doi: https://doi.org/10.1101/2022.06.23.22276396
Francisco Requena
1Université Paris Cité, INSERM UMR1163, Imagine Institute, Clinical Bioinformatics Laboratory, Paris, F-75006, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
David Salgado
2INSERM, Marseille Medical Genetics, Aix Marseille University, Marseille, F- 13385, France
3CNRS, Institut Français de Bioinformatique, IFB-core, UMS 3601, Evry, F- 91057, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Valérie Malan
4AP-HP, Necker Hospital for Sick Children, Fédération de Génétique et Médecine Génomique, Service de Médecine Génomique des Maladies Rares, Paris, F-75015, France
5Université Paris Cité, INSERM UMR1163, Imagine Institute, Developmental Brain Disorders Laboratory, Paris, F-75015, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Damien Sanlaville
6CHU de Lyon HCL-GH Est, Service de génétique, Bron, F-69677 France
7Université Lyon 1, CNRS, INSERM, Physiopathologie et Génétique du Neurone et du Muscle, UMR5261, U1315, Institut NeuroMyoGène, Lyon, F-69008, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Frédéric Bilan
8Service de Génétique, CHU de Poitiers, Poitiers, F-86000, France
9Laboratoire de Neurosciences Expérimentales et Cliniques INSERM U1084, Poitiers, F-86073, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Christophe Béroud
2INSERM, Marseille Medical Genetics, Aix Marseille University, Marseille, F- 13385, France
10APHM, Hôpital d’Enfants de la Timone, Département de Génétique Médicale, Marseille, F- 13385 France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Antonio Rausell
1Université Paris Cité, INSERM UMR1163, Imagine Institute, Clinical Bioinformatics Laboratory, Paris, F-75006, France
4AP-HP, Necker Hospital for Sick Children, Fédération de Génétique et Médecine Génomique, Service de Médecine Génomique des Maladies Rares, Paris, F-75015, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Antonio Rausell
  • For correspondence: antonio.rausell{at}institutimagine.org
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Copy number variants (CNVs) are a major cause of rare pediatric diseases with a broad spectrum of phenotypes. Genetic diagnosis based on comparative genomic hybridization tests typically identifies ∼8-10% of patients as having CNVs of unknown significance, revealing the current limits of clinical interpretation. The adoption of whole-genome sequencing (WGS) as a first-line genetic test has significantly increased the load of CNVs identified in single genomes. Alongside short- and long-read sequencing technologies, a number of pathogenicity scores have been developed for filtering and prioritizing large sets of candidate CNVs in clinical settings. However, current approaches are often based, either explicitly or implicitly, on clinically annotated reference sets, which are likely to bias their predictions. In this study we developed CNVscore, a supervised-learning approach combining tree ensembles and a Bayesian classifier trained on pathogenic and non-pathogenic CNVs from reference databases. Unlike previous approaches, CNVscore couples pathogenicity estimates with uncertainty scores, making it possible to evaluate the suitability of a model for the query CNVs. Comprehensive comparative benchmark tests across independent sets and against alternative methods showed that CNVscore effectively distinguishes between pathogenic and benign CNVs. We also found that CNVs associated with CNVscores of low uncertainty were predicted with significantly higher accuracy than those of high uncertainty. However, the performance of current scoring approaches, including CNVscore, was compromised on CNV sets enriched in highly uncertain variants and presenting unconventional features, such as functionally relevant non-coding elements or the presence of disease genes irrelevant for the clinical phenotypes investigated. Finally, we used the CNVscore framework to guide CNV scoring model selection for the French National Database of Constitutional CNVs (BANCCO), which includes clinical diagnosis annotations. The CNVscore framework provides an objective strategy for leveraging the uncertainty on bioinformatic predictions to enhance the assessment of CNV pathogenicity in rare-disease cohorts. CNVscore is available as open-source software from https://github.com/RausellLab/CNVscore and is integrated into the CNVxplorer webserver http://cnvxplorer.com.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

The Laboratory of Clinical Bioinformatics of the Imagine Institute, headed by A.R. was partly supported by the French National Research Agency (ANR) "Investissements d Avenir" Program [ANR-10-IAHU-01, ANR-17-RHUS-0002 - CIL-LICO project]; MSD Avenir fund (Devo-Decode project) ; Aviesan - ITMO Genetique-Genomique-Bioinformatique [ResDiCard : Resolving diagnostic deadlock in Cardiomyopathies project, AAP 2020 : Maladies Rares - Resoudre les impasses diagnostiques] and by Christian Dior Couture, Dior; F.R. is supported by a PhD fellowship from the Fondation Bettencourt-Schueller.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The study used ONLY available human data that were originally located at: Decipher: https://www.deciphergenomics.org/ Clinvar: https://www.ncbi.nlm.nih.gov/clinvar/ GnomAD: https://gnomad.broadinstitute.org/ DGV: http://dgv.tcag.ca/ IGRS: https://www.internationalgenome.org/data-portal/data-collection/structural-variation Dbvar: https://www.ncbi.nlm.nih.gov/dbvar Beyter et al, 2021: https://github.com/DecodeGenetics/LRS_SV_sets. Bancco: http://bancco.fr BANCCO database requires registration for access. The BANCCO database has received appropriate approval through the French National Committee for Informatics and Liberty (CNIL): CNIL authorization #2071658

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

All data produced in the present work are contained in the manuscript

https://github.com/RausellLab/CNVscore

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted June 27, 2022.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
CNVscore calculates pathogenicity scores for copy number variants together with uncertainty estimates accounting for learning biases in reference Mendelian disorder datasets
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
CNVscore calculates pathogenicity scores for copy number variants together with uncertainty estimates accounting for learning biases in reference Mendelian disorder datasets
Francisco Requena, David Salgado, Valérie Malan, Damien Sanlaville, Frédéric Bilan, Christophe Béroud, Antonio Rausell
medRxiv 2022.06.23.22276396; doi: https://doi.org/10.1101/2022.06.23.22276396
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
CNVscore calculates pathogenicity scores for copy number variants together with uncertainty estimates accounting for learning biases in reference Mendelian disorder datasets
Francisco Requena, David Salgado, Valérie Malan, Damien Sanlaville, Frédéric Bilan, Christophe Béroud, Antonio Rausell
medRxiv 2022.06.23.22276396; doi: https://doi.org/10.1101/2022.06.23.22276396

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetic and Genomic Medicine
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)