Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Evaluation of the Clinical Utility of DxGPT, a GPT-4 Based Large Language Model, through an Analysis of Diagnostic Accuracy and User Experience

View ORCID ProfileMarina Alvarez-Estape, Ivan Cano, Rosa Pino, Carla González Grado, View ORCID ProfileAndrea Aldemira-Liz, Javier Gonzálvez-Ortuño, Juanjo do Olmo, Javier Logroño, Marcelo Martínez, Carlos Mascías, Julián Isla, Jordi Martínez Roldán, View ORCID ProfileCristian Launes, View ORCID ProfileFrancesc Garcia-Cuyas, View ORCID ProfilePaula Esteller-Cucala
doi: https://doi.org/10.1101/2024.07.23.24310847
Marina Alvarez-Estape
1Data and Digital Strategy Department, Hospital Sant Joan de Déu, Barcelona, Spain
2Metabolomics and Biochemical Genetics, Institut de Recerca Sant Joan de Déu, Barcelona, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Marina Alvarez-Estape
Ivan Cano
3Pediatrics Department, Hospital Sant Joan de Déu, Barcelona, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rosa Pino
3Pediatrics Department, Hospital Sant Joan de Déu, Barcelona, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Carla González Grado
3Pediatrics Department, Hospital Sant Joan de Déu, Barcelona, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andrea Aldemira-Liz
3Pediatrics Department, Hospital Sant Joan de Déu, Barcelona, Spain
4Environment Effects on Child/Adolescent Well-being, Institut de Recerca Sant Joan de Déu, Barcelona, Spain
5Departament de Cirurgia i Especialitats Medicoquirúrgiques, Facultat de Medicina i Ciències de la Salut, Universitat de Barcelona, Barcelona, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Andrea Aldemira-Liz
Javier Gonzálvez-Ortuño
6Strategy and Planning Department, Hospital Sant Joan de Déu, Barcelona, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Juanjo do Olmo
7Foundation 29, Madrid, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Javier Logroño
7Foundation 29, Madrid, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Marcelo Martínez
7Foundation 29, Madrid, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Carlos Mascías
7Foundation 29, Madrid, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Julián Isla
7Foundation 29, Madrid, Spain
8Microsoft, Industry Solution Delivery, Madrid, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jordi Martínez Roldán
1Data and Digital Strategy Department, Hospital Sant Joan de Déu, Barcelona, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Cristian Launes
3Pediatrics Department, Hospital Sant Joan de Déu, Barcelona, Spain
5Departament de Cirurgia i Especialitats Medicoquirúrgiques, Facultat de Medicina i Ciències de la Salut, Universitat de Barcelona, Barcelona, Spain
9Pediatric Infectious Diseases and Microbiome Research Group, Institut de Recerca Sant Joan de Déu, Spain
10CIBER Epidemiology and Public Health (CIBERESP), Madrid, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Cristian Launes
  • For correspondence: cristian.launes{at}sjd.es
Francesc Garcia-Cuyas
1Data and Digital Strategy Department, Hospital Sant Joan de Déu, Barcelona, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Francesc Garcia-Cuyas
Paula Esteller-Cucala
1Data and Digital Strategy Department, Hospital Sant Joan de Déu, Barcelona, Spain
2Metabolomics and Biochemical Genetics, Institut de Recerca Sant Joan de Déu, Barcelona, Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Paula Esteller-Cucala
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Importance The time to accurately diagnose rare pediatric diseases often spans years. Assessing the diagnostic accuracy of an LLM-based tool on real pediatric cases can help reduce this time, providing quicker diagnoses for patients and their families.

Objective To evaluate the clinical utility of DxGPT as a support tool for differential diagnosis of both common and rare diseases.

Design Unicentric descriptive cross-sectional exploratory study. Anonymized data from 50 pediatric patients’ medical histories, covering common and rare pathologies, were used to generate clinical case notes. Each clinical case included essential data, with some expanded by complementary tests.

Setting This study was conducted at a reference pediatric hospital, Sant Joan de Déu Barcelona Children’s Hospital.

Participants A total of 50 clinical cases were diagnosed by 78 volunteer doctors (medical diagnostic team) with varying experience, each reviewing 3 clinical cases.

Interventions Each clinician listed up to five diagnoses per clinical case note. The same was done on the DxGPT web platform, obtaining the Top-5 diagnostic proposals. To evaluate DxGPT’s variability, each note was queried three times.

Main Outcome(s) and Measure(s) The study mainly focused on comparing diagnostic accuracy, defined as the percentage of cases with the correct diagnosis, between the medical diagnostic team and DxGPT. Other evaluation criteria included qualitative assessments. The medical diagnostic team also completed a survey on their user experience with DxGPT.

Results Top-5 diagnostic accuracy was 65% for clinicians and 60% for DxGPT, with no significant differences. Accuracies for common diseases were higher (Clinicians: 79%, DxGPT: 71%) than for rare diseases (Clinicians: 50%, DxGPT: 49%). Accuracy increased similarly in both groups with expanded information, but this increase was only stastitically significant in clinicians (simple 52% vs. expanded 69%; p=0.03). DxGPT’s response variability affected less than 5% of clinical case notes. A survey of 48 clinicians rated the DxGPT platform 3.9/5 overall, 4.1/5 for usefulness, and 4.5/5 for usability.

Conclusions and Relevance DxGPT showed diagnostic accuracies similar to medical staff from a pediatric hospital, indicating its potential for supporting differential diagnosis in other settings. Clinicians praised its usability and simplicity. These tools could provide new insights for challenging diagnostic cases.

Question Is DxGPT, a large language model-based (LLM-based) tool, effective for differential diagnosis support, specifically in the context of a clinical pediatric setting?

Findings In this unicentric cross-sectional study, diagnostic accuracy, measured as the proportion of clinical cases where any of the five diagnostic options included the correct diagnosis, showed comparable results between clinicians and DxGPT. Top-5 accuracy was 65% for clinicians and 60% for DxGPT.

Meaning These findings highlight the potential of LLM-based tools like DxGPT to support clinicians in making accurate and timely diagnoses, ultimately improving patient care.

Competing Interest Statement

do Olmo, Logroño and Martínez are employees of Foundation 29. Isla and Mascías are volunteers and board members of Foundation 29. Foundation 29 is a non-profit organization that developed DxGPT (publicly available at https://dxgpt.app/). Isla is a Microsoft’s employee. do Olmo, Logroño, Martínez, Isla and Mascías did not participate in data collection nor analysis of DxGPT’s performance. DxGPT, a project by Foundation 29, is funded through unrestricted grants from pharmaceutical companies (Takeda, UCB, Kyowa Kirin, Italfarmaco, and Sanofi) with no specific deliverables required. The project uses GPT-4 models on Microsoft Azure’s European datacenters, with computational resources provided free by Microsoft’s AI for Good program. Google offers free advertisement services through their NGO support program. These collaborations ensure GDPR compliance and support the project’s non-profit objectives. No personal data is stored, and no data is used to train or feed commercial Large Language Models or any other commercial service. No other disclosures were reported.

Funding Statement

This study did not receive any funding.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Ethics committee and Institutional Review Board of the Sant Joan de Déu Hospital gave ethical approval for this work (PIC-20-24).

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

  • Conflict Of Interest section updated.

Data Availability

Clinical cases with deidentified data without patient's sensitive and confidential information are available upon reasonable request to the corresponding author. Code to replicate analyses and figures and eTables 1, 5, 7 and 8 will be made public upon publication.

https://github.com/maralest/DxGPT_HSJD_Analysis

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
Posted July 31, 2024.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Evaluation of the Clinical Utility of DxGPT, a GPT-4 Based Large Language Model, through an Analysis of Diagnostic Accuracy and User Experience
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Evaluation of the Clinical Utility of DxGPT, a GPT-4 Based Large Language Model, through an Analysis of Diagnostic Accuracy and User Experience
Marina Alvarez-Estape, Ivan Cano, Rosa Pino, Carla González Grado, Andrea Aldemira-Liz, Javier Gonzálvez-Ortuño, Juanjo do Olmo, Javier Logroño, Marcelo Martínez, Carlos Mascías, Julián Isla, Jordi Martínez Roldán, Cristian Launes, Francesc Garcia-Cuyas, Paula Esteller-Cucala
medRxiv 2024.07.23.24310847; doi: https://doi.org/10.1101/2024.07.23.24310847
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Evaluation of the Clinical Utility of DxGPT, a GPT-4 Based Large Language Model, through an Analysis of Diagnostic Accuracy and User Experience
Marina Alvarez-Estape, Ivan Cano, Rosa Pino, Carla González Grado, Andrea Aldemira-Liz, Javier Gonzálvez-Ortuño, Juanjo do Olmo, Javier Logroño, Marcelo Martínez, Carlos Mascías, Julián Isla, Jordi Martínez Roldán, Cristian Launes, Francesc Garcia-Cuyas, Paula Esteller-Cucala
medRxiv 2024.07.23.24310847; doi: https://doi.org/10.1101/2024.07.23.24310847

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)