Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Evaluation of a Novel Large Language Model (LLM) Powered Chatbot for Oral-Boards Scenarios

View ORCID ProfileCaitlin Silvestri, View ORCID ProfileJoshua Roshal, View ORCID ProfileMeghal Shah, View ORCID ProfileWarren D. Widmann, Courtney Townsend, View ORCID ProfileRiley Brian, View ORCID ProfileJoseph C. L’Huillier, View ORCID ProfileSergio M. Navarro, View ORCID ProfileSarah Lund, View ORCID ProfileTejas S. Sathe
doi: https://doi.org/10.1101/2024.05.31.24308044
Caitlin Silvestri
11New York Presbyterian/Columbia University Irving Medical Center, Department of Surgery, New York, NY
10Collaboration of Surgical Education Fellows (CoSEF)
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Caitlin Silvestri
  • For correspondence: cs4004{at}cumc.columbia.edu
Joshua Roshal
2Brigham and Women’s Hospital/Harvard Medical School, Boston, MA
3University of Texas Medical Branch, Galveston, TX
10Collaboration of Surgical Education Fellows (CoSEF)
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Joshua Roshal
Meghal Shah
11New York Presbyterian/Columbia University Irving Medical Center, Department of Surgery, New York, NY
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Meghal Shah
Warren D. Widmann
4SUNY Downstate Health Science University, Brooklyn, NY
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Warren D. Widmann
Courtney Townsend
3University of Texas Medical Branch, Galveston, TX
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Riley Brian
5University of California San Francisco, San Francisco, CA
10Collaboration of Surgical Education Fellows (CoSEF)
MD, MAEd
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Riley Brian
Joseph C. L’Huillier
6University at Buffalo, Jacobs School of Medicine and Biomedical Sciences, Department of Surgery, Buffalo, NY
7University at Buffalo, Department of Epidemiology and Environmental Health, Division of Health Services Policy and Practice, School of Public Health and Health Professions, Buffalo, NY
10Collaboration of Surgical Education Fellows (CoSEF)
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Joseph C. L’Huillier
Sergio M. Navarro
8University of Minnesota, Department of Surgery, Minneapolis, MN
9Mayo Clinic, Department of Surgery, Rochester, MN
10Collaboration of Surgical Education Fellows (CoSEF)
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sergio M. Navarro
Sarah Lund
9Mayo Clinic, Department of Surgery, Rochester, MN
10Collaboration of Surgical Education Fellows (CoSEF)
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sarah Lund
Tejas S. Sathe
11New York Presbyterian/Columbia University Irving Medical Center, Department of Surgery, New York, NY
5University of California San Francisco, San Francisco, CA
10Collaboration of Surgical Education Fellows (CoSEF)
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Tejas S. Sathe
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Introduction While previous studies have demonstrated that generative artificial intelligence (AI) can pass medical licensing exams, AI’s role as an examiner in complex, interactive assessments remains unknown. AI-powered chatbots could serve as educational tools to simulate oral examination dialogues. Here, we present initial validity evidence for an AI-powered chatbot designed for general surgery residents to prepare for the American Board of Surgery (ABS) Certifying Exam (CE).

Methods We developed a chatbot using GPT-4 to simulate oral board scenarios. Scenarios were completed by general surgery residents from six different institutions. Two experienced surgeons evaluated the chatbot across five domains: inappropriate content, missing content, likelihood of harm, extent of harm, and hallucinations. We measured inter-rater reliability to determine evaluation consistency.

Results Seventeen residents completed a total of 20 scenarios. Commonly tested topics included small bowel obstruction (30%), diverticulitis (20%), and breast disease (15%). Based on two independent reviewers, evaluation revealed 11 to 25% of chatbot simulations had no errors and an additional 11% to 35% contained errors of minimal clinical significance. Chatbot limitations included incorrect management advice and critical omissions of information.

Conclusions This study demonstrates the potential of an AI-powered chatbot in enhancing surgical education through oral board simulations. Despite challenges in accuracy and safety, the chatbot offers a novel approach to medical education, underscoring the need for further refinement and standardized evaluation frameworks. Incorporating domain-specific knowledge and expert insights is crucial for improving the efficacy of AI tools in medical education.

Competing Interest Statement

Joshua Roshal is a consultant for McGraw Hill, an American publishing company whose mission is the education of current and future healthcare professionals.

Funding Statement

This study did not receive any funding

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The Institutional Review Board of Columbia University classified this project as exempt and waived a full review under Protocol AAAV2136

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

All data produced in the present study are available upon reasonable request to the authors. Data analysis performed for the methods is available via the link below

https://github.com/tsathe/surgbot-data-analysis

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted June 01, 2024.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Evaluation of a Novel Large Language Model (LLM) Powered Chatbot for Oral-Boards Scenarios
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Evaluation of a Novel Large Language Model (LLM) Powered Chatbot for Oral-Boards Scenarios
Caitlin Silvestri, Joshua Roshal, Meghal Shah, Warren D. Widmann, Courtney Townsend, Riley Brian, Joseph C. L’Huillier, Sergio M. Navarro, Sarah Lund, Tejas S. Sathe
medRxiv 2024.05.31.24308044; doi: https://doi.org/10.1101/2024.05.31.24308044
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Evaluation of a Novel Large Language Model (LLM) Powered Chatbot for Oral-Boards Scenarios
Caitlin Silvestri, Joshua Roshal, Meghal Shah, Warren D. Widmann, Courtney Townsend, Riley Brian, Joseph C. L’Huillier, Sergio M. Navarro, Sarah Lund, Tejas S. Sathe
medRxiv 2024.05.31.24308044; doi: https://doi.org/10.1101/2024.05.31.24308044

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Medical Education
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)