Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

The Challenge Dataset – simple evaluation for safe, transparent healthcare AI deployment

James K. Sanayei, Mohamed Abdalla, Monish Ahluwalia, Laleh Seyyed-Kalantari, Simona Minotti, Benjamin A. Fine
doi: https://doi.org/10.1101/2022.12.15.22280619
James K. Sanayei
aFaculty of Medicine, University of Toronto, Toronto, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: james.sanayei{at}mail.utoronto.ca
Mohamed Abdalla
bDepartment of Computer Science, University of Toronto, Toronto, Canada
dInstitute for Better Health, Trillium Health Partners, Mississauga, Canada
eVector Institute for Artificial Intelligence, Toronto, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Monish Ahluwalia
aFaculty of Medicine, University of Toronto, Toronto, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Laleh Seyyed-Kalantari
cDepartment of Electrical Engineering and Computer Science, York University, Toronto, Canada
eVector Institute for Artificial Intelligence, Toronto, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Simona Minotti
dInstitute for Better Health, Trillium Health Partners, Mississauga, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Benjamin A. Fine
aFaculty of Medicine, University of Toronto, Toronto, Canada
dInstitute for Better Health, Trillium Health Partners, Mississauga, Canada
eVector Institute for Artificial Intelligence, Toronto, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

In this paper, we demonstrate the use of a “Challenge Dataset”: a small, site-specific, manually curated dataset – enriched with uncommon, risk-exposing, and clinically important edge cases – that can facilitate pre-deployment evaluation and identification of clinically relevant AI performance deficits. The five major steps of the Challenge Dataset process are described in detail, including defining use cases, edge case selection, dataset size determination, dataset compilation, and model evaluation. Evaluating performance of four chest X-ray classifiers (one third-party developer model and three models trained on open-source datasets) on a small, manually curated dataset (410 images), we observe a generalization gap of 20.7% (13.5% - 29.1%) for sensitivity and 10.5% (4.3% - 18.3%) for specificity compared to developer-reported values. Performance decreases further when evaluated against edge cases (critical findings: 43.4% [27.4% - 59.8%]; unusual findings: 45.9% [23.1% - 68.7%]; solitary findings 45.9% [23.1% - 68.7%]). Expert manual audit revealed examples of critical model failure (e.g., missed pneumomediastinum) with potential for patient harm. As a measure of effort, we find that the minimum required number of Challenge Dataset cases is about 1% of the annual total for our site (approximately 400 of 40,000). Overall, we find that the Challenge Dataset process provides a method for local pre-deployment evaluation of medical imaging AI models, allowing imaging providers to identify both deficits in model generalizability and specific points of failure prior to clinical deployment.

Competing Interest Statement

BF is a shareholder of Pocket Health and Eva Center and has received consultant fees from Canon Medical. JS, MA, MA, LSK, and SM have no conflicts of interest to disclose.

Funding Statement

This work was funded by Canada's Digital Technology Supercluster. The funder and AI developer had no role in the study design or analysis.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The research ethics board of Trillium Health Partners gave ethical approval for this work

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

The dataset from this study is held securely at THP. Coded / aggregate data can be made accessible (contact senior author).

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted December 16, 2022.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
The Challenge Dataset – simple evaluation for safe, transparent healthcare AI deployment
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
The Challenge Dataset – simple evaluation for safe, transparent healthcare AI deployment
James K. Sanayei, Mohamed Abdalla, Monish Ahluwalia, Laleh Seyyed-Kalantari, Simona Minotti, Benjamin A. Fine
medRxiv 2022.12.15.22280619; doi: https://doi.org/10.1101/2022.12.15.22280619
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
The Challenge Dataset – simple evaluation for safe, transparent healthcare AI deployment
James K. Sanayei, Mohamed Abdalla, Monish Ahluwalia, Laleh Seyyed-Kalantari, Simona Minotti, Benjamin A. Fine
medRxiv 2022.12.15.22280619; doi: https://doi.org/10.1101/2022.12.15.22280619

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)