Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Generative Large Language Models in Electronic Health Records for Patient Care Since 2023: A Systematic Review

View ORCID ProfileXinsong Du, Zhengyang Zhou, Yifei Wang, Ya-Wen Chuang, Richard Yang, Wenyu Zhang, Xinyi Wang, View ORCID ProfileRui Zhang, View ORCID ProfilePengyu Hong, David W. Bates, Li Zhou
doi: https://doi.org/10.1101/2024.08.11.24311828
Xinsong Du
1Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
2Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
3Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Xinsong Du
  • For correspondence: xidu1{at}bwh.harvard.edu
Zhengyang Zhou
4Department of Computer Science, Brandeis University, Waltham, MA 02453
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yifei Wang
4Department of Computer Science, Brandeis University, Waltham, MA 02453
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ya-Wen Chuang
5Division of Nephrology, Department of Internal Medicine, Taichung Veterans General Hospital, Taichung, Taiwan, 407219
6Department of Post-Baccalaureate Medicine, College of Medicine, National Chung Hsing University, Taichung, Taiwan, 402202
7School of Medicine, College of Medicine, China Medical University, Taichung, Taiwan, 404328
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Richard Yang
1Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
2Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Wenyu Zhang
1Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
2Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
3Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Xinyi Wang
1Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
2Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
3Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rui Zhang
8Division of Computational Health Sciences, University of Minnesota, Minneapolis, MN 55455
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Rui Zhang
Pengyu Hong
4Department of Computer Science, Brandeis University, Waltham, MA 02453
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Pengyu Hong
David W. Bates
1Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
2Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
9Department of Health Policy and Management, Harvard T.H. Chan School of Public Health, Boston, MA 02115
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Li Zhou
1Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
2Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
3Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Background Generative Large language models (LLMs) represent a significant advancement in natural language processing, achieving state-of-the-art performance across various tasks. However, their application in clinical settings using real electronic health records (EHRs) is still rare and presents numerous challenges.

Objective This study aims to systematically review the use of generative LLMs, and the effectiveness of relevant techniques in patient care-related topics involving EHRs, summarize the challenges faced, and suggest future directions.

Methods A Boolean search for peer-reviewed articles was conducted on May 19th, 2024 using PubMed and Web of Science to include research articles published since 2023, which was one month after the release of ChatGPT. The search results were deduplicated. Multiple reviewers, including biomedical informaticians, computer scientists, and a physician, screened the publications for eligibility and conducted data extraction. Only studies utilizing generative LLMs to analyze real EHR data were included. We summarized the use of prompt engineering, fine-tuning, multimodal EHR data, and evaluation matrices. Additionally, we identified current challenges in applying LLMs in clinical settings as reported by the included studies and proposed future directions.

Results The initial search identified 6,328 unique studies, with 76 studies included after eligibility screening. Of these, 67 studies (88.2%) employed zero-shot prompting, five of them reported 100% accuracy on five specific clinical tasks. Nine studies used advanced prompting strategies; four tested these strategies experimentally, finding that prompt engineering improved performance, with one study noting a non-linear relationship between the number of examples in a prompt and performance improvement. Eight studies explored fine-tuning generative LLMs, all reported performance improvements on specific tasks, but three of them noted potential performance degradation after fine-tuning on certain tasks. Only two studies utilized multimodal data, which improved LLM-based decision-making and enabled accurate rare disease diagnosis and prognosis. The studies employed 55 different evaluation metrics for 22 purposes, such as correctness, completeness, and conciseness. Two studies investigated LLM bias, with one detecting no bias and the other finding that male patients received more appropriate clinical decision-making suggestions. Six studies identified hallucinations, such as fabricating patient names in structured thyroid ultrasound reports. Additional challenges included but were not limited to the impersonal tone of LLM consultations, which made patients uncomfortable, and the difficulty patients had in understanding LLM responses.

Conclusion Our review indicates that few studies have employed advanced computational techniques to enhance LLM performance. The diverse evaluation metrics used highlight the need for standardization. LLMs currently cannot replace physicians due to challenges such as bias, hallucinations, and impersonal responses.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study was funded by NIH-NIA R44AG081006, NIH-NLM 1R01LM014239, and NIH-NIA R01AG080429

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

  • Writing improvements: 1) make the section orders consistent between methods and results, and 2) improve tables and figures.

Data Availability

All data produced in the present work are contained in the manuscript

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted August 19, 2024.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Generative Large Language Models in Electronic Health Records for Patient Care Since 2023: A Systematic Review
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Generative Large Language Models in Electronic Health Records for Patient Care Since 2023: A Systematic Review
Xinsong Du, Zhengyang Zhou, Yifei Wang, Ya-Wen Chuang, Richard Yang, Wenyu Zhang, Xinyi Wang, Rui Zhang, Pengyu Hong, David W. Bates, Li Zhou
medRxiv 2024.08.11.24311828; doi: https://doi.org/10.1101/2024.08.11.24311828
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Generative Large Language Models in Electronic Health Records for Patient Care Since 2023: A Systematic Review
Xinsong Du, Zhengyang Zhou, Yifei Wang, Ya-Wen Chuang, Richard Yang, Wenyu Zhang, Xinyi Wang, Rui Zhang, Pengyu Hong, David W. Bates, Li Zhou
medRxiv 2024.08.11.24311828; doi: https://doi.org/10.1101/2024.08.11.24311828

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)