Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Evaluating Large Language Models in Extracting Cognitive Exam Dates and Scores

Hao Zhang, View ORCID ProfileNeil Jethani, Simon Jones, View ORCID ProfileNicholas Genes, View ORCID ProfileVincent J. Major, View ORCID ProfileIan S. Jaffe, View ORCID ProfileAnthony B. Cardillo, View ORCID ProfileNoah Heilenbach, View ORCID ProfileNadia Fazal Ali, View ORCID ProfileLuke J. Bonanni, View ORCID ProfileAndrew J. Clayburn, View ORCID ProfileZain Khera, View ORCID ProfileErica C. Sadler, View ORCID ProfileJaideep Prasad, View ORCID ProfileJamie Schlacter, View ORCID ProfileKevin Liu, View ORCID ProfileBenjamin Silva, View ORCID ProfileSophie Montgomery, View ORCID ProfileEric J. Kim, View ORCID ProfileJacob Lester, View ORCID ProfileTheodore M. Hill, View ORCID ProfileAlba Avoricani, View ORCID ProfileEthan Chervonski, James Davydov, William Small, View ORCID ProfileEesha Chakravartty, Himanshu Grover, John A. Dodson, View ORCID ProfileAbraham A. Brody, View ORCID ProfileYindalon Aphinyanaphongs, View ORCID ProfileArjun Masurkar, View ORCID ProfileNarges Razavian
doi: https://doi.org/10.1101/2023.07.10.23292373
Hao Zhang
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Neil Jethani
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Neil Jethani
Simon Jones
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nicholas Genes
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nicholas Genes
Vincent J. Major
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Vincent J. Major
Ian S. Jaffe
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ian S. Jaffe
Anthony B. Cardillo
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Anthony B. Cardillo
Noah Heilenbach
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Noah Heilenbach
Nadia Fazal Ali
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nadia Fazal Ali
Luke J. Bonanni
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Luke J. Bonanni
Andrew J. Clayburn
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Andrew J. Clayburn
Zain Khera
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Zain Khera
Erica C. Sadler
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Erica C. Sadler
Jaideep Prasad
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jaideep Prasad
Jamie Schlacter
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jamie Schlacter
Kevin Liu
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kevin Liu
Benjamin Silva
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Benjamin Silva
Sophie Montgomery
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sophie Montgomery
Eric J. Kim
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Eric J. Kim
Jacob Lester
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jacob Lester
Theodore M. Hill
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Theodore M. Hill
Alba Avoricani
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Alba Avoricani
Ethan Chervonski
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ethan Chervonski
James Davydov
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
William Small
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Eesha Chakravartty
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Eesha Chakravartty
Himanshu Grover
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
John A. Dodson
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Abraham A. Brody
2NYU Rory Meyers College of Nursing, NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Abraham A. Brody
Yindalon Aphinyanaphongs
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yindalon Aphinyanaphongs
Arjun Masurkar
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Arjun Masurkar
Narges Razavian
1NYU Grossman School of Medicine
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Narges Razavian
  • For correspondence: narges.razavian{at}nyulangone.org
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Importance Large language models (LLMs) are crucial for medical tasks. Ensuring their reliability is vital to avoid false results. Our study assesses two state-of-the-art LLMs (ChatGPT and LlaMA-2) for extracting clinical information, focusing on cognitive tests like MMSE and CDR.

Objective Evaluate ChatGPT and LlaMA-2 performance in extracting MMSE and CDR scores, including their associated dates.

Methods Our data consisted of 135,307 clinical notes (Jan 12th, 2010 to May 24th, 2023) mentioning MMSE, CDR, or MoCA. After applying inclusion criteria 34,465 notes remained, of which 765 underwent ChatGPT (GPT-4) and LlaMA-2, and 22 experts reviewed the responses. ChatGPT successfully extracted MMSE and CDR instances with dates from 742 notes. We used 20 notes for fine-tuning and training the reviewers. The remaining 722 were assigned to reviewers, with 309 each assigned to two reviewers simultaneously. Inter-rater-agreement (Fleiss’ Kappa), precision, recall, true/false negative rates, and accuracy were calculated. Our study follows TRIPOD reporting guidelines for model validation.

Results For MMSE information extraction, ChatGPT (vs. LlaMA-2) achieved accuracy of 83% (vs. 66.4%), sensitivity of 89.7% (vs. 69.9%), true-negative rates of 96% (vs 60.0%), and precision of 82.7% (vs 62.2%). For CDR the results were lower overall, with accuracy of 87.1% (vs. 74.5%), sensitivity of 84.3% (vs. 39.7%), true-negative rates of 99.8% (98.4%), and precision of 48.3% (vs. 16.1%). We qualitatively evaluated the MMSE errors of ChatGPT and LlaMA-2 on double-reviewed notes. LlaMA-2 errors included 27 cases of total hallucination, 19 cases of reporting other scores instead of MMSE, 25 missed scores, and 23 cases of reporting only the wrong date. In comparison, ChatGPT’s errors included only 3 cases of total hallucination, 17 cases of wrong test reported instead of MMSE, and 19 cases of reporting a wrong date.

Conclusions In this diagnostic/prognostic study of ChatGPT and LlaMA-2 for extracting cognitive exam dates and scores from clinical notes, ChatGPT exhibited high accuracy, with better performance compared to LlaMA-2. The use of LLMs could benefit dementia research and clinical care, by identifying eligible patients for treatments initialization or clinical trial enrollments. Rigorous evaluation of LLMs is crucial to understanding their capabilities and limitations.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study was funded by NYU Langone Medical Center Information Technology (MCIT) center. Author N.R. and A.M. are also funded by the following awards National Institute On Aging, of the National Institutes of Health, under Award Numbers R01AG085617 and P30AG066512. Authors H.Z. S.J., V.J.M, J.A.D., A.A.B., Y.A., A.M. and N.R. are also supported by the award number R01AG079175 from the National Institute On Aging, of the National Institutes of Health.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Ethics committee/IRB of New York University Langone Health gave ethical approval of this work.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

  • We added results of the open source model (LlaMA-2) to the analysis.

Data Availability

The data contains private patient information and will not be made available.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted February 13, 2024.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Evaluating Large Language Models in Extracting Cognitive Exam Dates and Scores
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Evaluating Large Language Models in Extracting Cognitive Exam Dates and Scores
Hao Zhang, Neil Jethani, Simon Jones, Nicholas Genes, Vincent J. Major, Ian S. Jaffe, Anthony B. Cardillo, Noah Heilenbach, Nadia Fazal Ali, Luke J. Bonanni, Andrew J. Clayburn, Zain Khera, Erica C. Sadler, Jaideep Prasad, Jamie Schlacter, Kevin Liu, Benjamin Silva, Sophie Montgomery, Eric J. Kim, Jacob Lester, Theodore M. Hill, Alba Avoricani, Ethan Chervonski, James Davydov, William Small, Eesha Chakravartty, Himanshu Grover, John A. Dodson, Abraham A. Brody, Yindalon Aphinyanaphongs, Arjun Masurkar, Narges Razavian
medRxiv 2023.07.10.23292373; doi: https://doi.org/10.1101/2023.07.10.23292373
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Evaluating Large Language Models in Extracting Cognitive Exam Dates and Scores
Hao Zhang, Neil Jethani, Simon Jones, Nicholas Genes, Vincent J. Major, Ian S. Jaffe, Anthony B. Cardillo, Noah Heilenbach, Nadia Fazal Ali, Luke J. Bonanni, Andrew J. Clayburn, Zain Khera, Erica C. Sadler, Jaideep Prasad, Jamie Schlacter, Kevin Liu, Benjamin Silva, Sophie Montgomery, Eric J. Kim, Jacob Lester, Theodore M. Hill, Alba Avoricani, Ethan Chervonski, James Davydov, William Small, Eesha Chakravartty, Himanshu Grover, John A. Dodson, Abraham A. Brody, Yindalon Aphinyanaphongs, Arjun Masurkar, Narges Razavian
medRxiv 2023.07.10.23292373; doi: https://doi.org/10.1101/2023.07.10.23292373

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)