Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Human versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5, and Humans in Clinical Chemistry Multiple-Choice Questions

View ORCID ProfileMalik Sallam, Khaled Al-Salahat, Huda Eid, Jan Egger, View ORCID ProfileBehrus Puladi
doi: https://doi.org/10.1101/2024.01.08.24300995
Malik Sallam
1Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, Jordan
2Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Amman, Jordan
3Scientific Approaches to Fight Epidemics of Infectious Diseases (SAFE-ID) Research Group, The University of Jordan, Amman, Jordan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Malik Sallam
  • For correspondence: malik.sallam{at}ju.edu.jo
Khaled Al-Salahat
1Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, Jordan
3Scientific Approaches to Fight Epidemics of Infectious Diseases (SAFE-ID) Research Group, The University of Jordan, Amman, Jordan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Huda Eid
3Scientific Approaches to Fight Epidemics of Infectious Diseases (SAFE-ID) Research Group, The University of Jordan, Amman, Jordan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jan Egger
4Institute for AI in Medicine (IKIM), University Medicine Essen (AöR), Essen, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Behrus Puladi
5Institute of Medical Informatics, University Hospital RWTH Aachen, Aachen, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Behrus Puladi
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

The advances in large language models (LLMs) are evolving rapidly. Artificial intelligence (AI) chatbots based on LLMs excel in language understanding and generation, with potential utility to transform healthcare education and practice. However, it is important to assess the performance of such AI models in various topics to highlight its strengths and possible limitations. Therefore, this study aimed to evaluate the performance of ChatGPT (GPT-3.5 and GPT-4), Bing, and Bard compared to human students at a postgraduate master’s (MSc) level in Medical Laboratory Sciences. The study design was based on the METRICS checklist for the design and reporting of AI-based studies in healthcare. The study utilized a dataset of 60 Clinical Chemistry multiple-choice questions (MCQs) initially conceived for assessment of 20 MSc students. The revised Bloom’s taxonomy was used as the framework for classifying the MCQs into four cognitive categories: Remember, Understand, Analyze, and Apply. A modified version of the CLEAR tool was used for assessment of the quality of AI-generated content, with Cohen’s κ for inter-rater agreement. Compared to the mean students’ score which was 40/60 (66.8%), GPT-4 scored 54/60 (90.0%), followed by Bing (46/60, 76.7%), GPT-3.5 (44/60, 73.3%), and Bard (40/60, 66.7%). Statistically significant better performance was noted in lower cognitive domains (Remember and Understand) in GPT-3.5, GPT-4, and Bard. The CLEAR scores indicated that ChatGPT-4 performance was “Excellent” compared to “Above average” performance of ChatGPT-3.5, Bing, and Bard. The findings indicated that ChatGPT-4 excelled in the Clinical Chemistry exam, while ChatGPT-3.5, Bing, and Bard were above-average. Given that the MCQs were directed to postgraduate students with a high degree of specialization, the performance of these AI chatbots was remarkable. Due to the risks of academic dishonesty and possible dependence on these AI models, the appropriateness of MCQs as an assessment tool in higher education should be re-evaluated.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

The author(s) received no specific funding for this work.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

In conducting this study, we paid careful attention to ethical implications. Ethical clearance for this research was determined to be non-essential, given the nature of the data involved. The data utilized were entirely anonymized, ensuring no breach of confidentiality or personal privacy. Additionally, the university examination results, which formed part of our dataset, are publicly accessible and open for academic scrutiny. Moreover, the MCQs employed in the study were originally created by the first author. These questions are devoid of any copyright concerns, further reinforcing the ethical integrity of our research approach.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data availability statement

The data that support the findings of this study are available on request from the corresponding author (M.S.). The data are not publicly available due to the confidentiality of the questions created for an exam purposes.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted January 09, 2024.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Human versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5, and Humans in Clinical Chemistry Multiple-Choice Questions
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Human versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5, and Humans in Clinical Chemistry Multiple-Choice Questions
Malik Sallam, Khaled Al-Salahat, Huda Eid, Jan Egger, Behrus Puladi
medRxiv 2024.01.08.24300995; doi: https://doi.org/10.1101/2024.01.08.24300995
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Human versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5, and Humans in Clinical Chemistry Multiple-Choice Questions
Malik Sallam, Khaled Al-Salahat, Huda Eid, Jan Egger, Behrus Puladi
medRxiv 2024.01.08.24300995; doi: https://doi.org/10.1101/2024.01.08.24300995

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)