Abstract
Deep learning models for variant pathogenicity prediction can recapitulate expert-curated annotations, but their performance remains unexplored on actual disease phenotypes in a real-world setting. Here, we apply three state-of-the-art pathogenicity prediction models to classify hereditary breast cancer gene variants in the UK Biobank. Predicted pathogenic variants in BRCA1, BRCA2 and PALB2, but not ATM and CHEK2, were associated with increased breast cancer risk. We explored gene-specific score thresholds for variant pathogenicity, finding that they could improve model performance. However, when specifically tasked with classifying variants of uncertain significance, the deep learning models were generally of limited clinical utility.
Competing Interest Statement
RDC reports no competing interests. RBP has received grants from the National Institutes of Health, Department of Defense, Prostate Cancer Foundation, National Palliative Care Research Center, NCCN Foundation, Conquer Cancer Foundation, Humana, Emerson Collective, Schmidt Futures, Arnold Ventures, Mendel.ai, and Veterans Health Administration; personal fees and equity from GNS Healthcare, Thyme Care, and Onc.AI; personal fees from the ConcertAI, Cancer Study Group, Biofourmis, Genetic Chemistry Therapeutics, CreditSuisse, G1 Therapeutics, Humana, and Nanology; honoraria from Flatiron and Medscape; has board membership (unpaid) at the Coalition to Transform Advanced Care and American Cancer Society; and serves on a leadership consortium (unpaid) at the National Quality Forum, all outside the submitted work. KLN reports serving on a Scientific Advisory Board for Merck, unrelated to the current study.
Funding Statement
This study was funded by the NIH/NCI (5K08CA263541), awarded to RBP.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
All UK Biobank participants provided written informed consent, which was approved by the North West Multicenter Research Ethics Committee. As the present study involved reanalysis of fully de-identified preexisting data, no additional approval was required.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes