Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Can ChatGPT-4o really pass medical science exams? A pragmatic analysis using novel questions

View ORCID ProfilePhilip M. Newton, Christopher J. Summers, Uzman Zaheer, Maira Xiromeriti, Jemima R. Stokes, Jaskaran Singh Bhangu, Elis G. Roome, Alanna Roberts-Phillips, Darius Mazaheri-Asadi, Cameron D. Jones, Stuart Hughes, Dominic Gilbert, Ewan Jones, Keioni Essex, Emily C. Ellis, Ross Davey, Adrienne A. Cox, Jessica A. Bassett
doi: https://doi.org/10.1101/2024.06.29.24309595
Philip M. Newton
Swansea University Medical School, Swansea, Wales, United Kingdom, SA2 8PP
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Philip M. Newton
  • For correspondence: p.newton{at}swansea.ac.uk
Christopher J. Summers
Swansea University Medical School, Swansea, Wales, United Kingdom, SA2 8PP
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Uzman Zaheer
Swansea University Medical School, Swansea, Wales, United Kingdom, SA2 8PP
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Maira Xiromeriti
Swansea University Medical School, Swansea, Wales, United Kingdom, SA2 8PP
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jemima R. Stokes
Swansea University Medical School, Swansea, Wales, United Kingdom, SA2 8PP
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jaskaran Singh Bhangu
Swansea University Medical School, Swansea, Wales, United Kingdom, SA2 8PP
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Elis G. Roome
Swansea University Medical School, Swansea, Wales, United Kingdom, SA2 8PP
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alanna Roberts-Phillips
Swansea University Medical School, Swansea, Wales, United Kingdom, SA2 8PP
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Darius Mazaheri-Asadi
Swansea University Medical School, Swansea, Wales, United Kingdom, SA2 8PP
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Cameron D. Jones
Swansea University Medical School, Swansea, Wales, United Kingdom, SA2 8PP
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Stuart Hughes
Swansea University Medical School, Swansea, Wales, United Kingdom, SA2 8PP
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Dominic Gilbert
Swansea University Medical School, Swansea, Wales, United Kingdom, SA2 8PP
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ewan Jones
Swansea University Medical School, Swansea, Wales, United Kingdom, SA2 8PP
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Keioni Essex
Swansea University Medical School, Swansea, Wales, United Kingdom, SA2 8PP
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Emily C. Ellis
Swansea University Medical School, Swansea, Wales, United Kingdom, SA2 8PP
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ross Davey
Swansea University Medical School, Swansea, Wales, United Kingdom, SA2 8PP
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Adrienne A. Cox
Swansea University Medical School, Swansea, Wales, United Kingdom, SA2 8PP
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jessica A. Bassett
Swansea University Medical School, Swansea, Wales, United Kingdom, SA2 8PP
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

ChatGPT apparently shows excellent performance on high level professional exams such as those involved in medical assessment and licensing. This has raised concerns that ChatGPT could be used for academic misconduct, especially in unproctored online exams. However, ChatGPT has also shown weaker performance on questions with pictures, and there have been concerns that ChatGPT’s performance may be artificially inflated by the public nature of the sample questions tested, meaning they likely formed part of the training materials for ChatGPT. This led to suggestions that cheating could be mitigated by using novel questions for every sitting of an exam and making extensive use of picture-based questions. These approaches remain untested.

Here we tested the performance of ChatGPT-4o on existing medical licensing exams in the UK and USA, and on novel questions based on those exams.

ChatGPT-4o scored 94% on the United Kingdom Medical Licensing Exam Applied Knowledge Test, and 89.9% on the United States Medical Licensing Exam Step 1. Performance was not diminished when the questions were rewritten into novel versions, or on completely novel questions which were not based on any existing questions. ChatGPT did show a slightly reduced performance on questions containing images, particularly when the answer options were added to an image as text labels.

These data demonstrate that the performance of ChatGPT continues to improve and that online unproctored exams are an invalid form of assessment of the foundational knowledge needed for higher order learning.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study did not receive any funding

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

  • Corrected typos throughout the manuscript.

Data Availability

All data produced in the present work are contained in the manuscript

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted July 02, 2024.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Can ChatGPT-4o really pass medical science exams? A pragmatic analysis using novel questions
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Can ChatGPT-4o really pass medical science exams? A pragmatic analysis using novel questions
Philip M. Newton, Christopher J. Summers, Uzman Zaheer, Maira Xiromeriti, Jemima R. Stokes, Jaskaran Singh Bhangu, Elis G. Roome, Alanna Roberts-Phillips, Darius Mazaheri-Asadi, Cameron D. Jones, Stuart Hughes, Dominic Gilbert, Ewan Jones, Keioni Essex, Emily C. Ellis, Ross Davey, Adrienne A. Cox, Jessica A. Bassett
medRxiv 2024.06.29.24309595; doi: https://doi.org/10.1101/2024.06.29.24309595
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Can ChatGPT-4o really pass medical science exams? A pragmatic analysis using novel questions
Philip M. Newton, Christopher J. Summers, Uzman Zaheer, Maira Xiromeriti, Jemima R. Stokes, Jaskaran Singh Bhangu, Elis G. Roome, Alanna Roberts-Phillips, Darius Mazaheri-Asadi, Cameron D. Jones, Stuart Hughes, Dominic Gilbert, Ewan Jones, Keioni Essex, Emily C. Ellis, Ross Davey, Adrienne A. Cox, Jessica A. Bassett
medRxiv 2024.06.29.24309595; doi: https://doi.org/10.1101/2024.06.29.24309595

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Medical Education
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)