Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

LCD Benchmark: Long Clinical Document Benchmark on Mortality Prediction for Language Models

View ORCID ProfileWonJin Yoon, View ORCID ProfileShan Chen, Yanjun Gao, Zhanzhan Zhao, Dmitriy Dligach, Danielle S. Bitterman, Majid Afshar, Timothy Miller
doi: https://doi.org/10.1101/2024.03.26.24304920
WonJin Yoon
1Computational Health Informatics Program, Boston Children’s Hospital, MA, USA
2Department of Pediatrics, Harvard Medical School, MA, USA
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for WonJin Yoon
  • For correspondence: Wonjin.Yoon{at}childrens.harvard.edu
Shan Chen
1Computational Health Informatics Program, Boston Children’s Hospital, MA, USA
2Department of Pediatrics, Harvard Medical School, MA, USA
3Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
4Department of Radiation Oncology, Brigham and Women’s Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
MS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Shan Chen
Yanjun Gao
5Department of Medicine, University of Wisconsin - Madison, Madison, WI, USA
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zhanzhan Zhao
1Computational Health Informatics Program, Boston Children’s Hospital, MA, USA
2Department of Pediatrics, Harvard Medical School, MA, USA
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Dmitriy Dligach
6Loyola University Chicago. Department of Computer Science. Chicago, IL, USA
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Danielle S. Bitterman
1Computational Health Informatics Program, Boston Children’s Hospital, MA, USA
2Department of Pediatrics, Harvard Medical School, MA, USA
3Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
4Department of Radiation Oncology, Brigham and Women’s Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Majid Afshar
5Department of Medicine, University of Wisconsin - Madison, Madison, WI, USA
MD MSCR
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Timothy Miller
1Computational Health Informatics Program, Boston Children’s Hospital, MA, USA
2Department of Pediatrics, Harvard Medical School, MA, USA
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

ABSTRACT

Objective The application of Natural Language Processing (NLP) in the clinical domain is important due to the rich unstructured information in clinical documents, which often remains inaccessible in structured data. When applying NLP methods to a certain domain, the role of benchmark datasets is crucial as benchmark datasets not only guide the selection of best-performing models but also enable the assessment of the reliability of the generated outputs. Despite the recent availability of language models (LMs) capable of longer context, benchmark datasets targeting long clinical document classification tasks are absent.

Materials and Methods To address this issue, we propose LCD benchmark, a benchmark for the task of predicting 30-day out-of-hospital mortality using discharge notes of MIMIC-IV and statewide death data. We evaluated this benchmark dataset using baseline models, from bag-of-words and CNN to instruction-tuned large language models. Additionally, we provide a comprehensive analysis of the model outputs, including manual review and visualization of model weights, to offer insights into their predictive capabilities and limitations.

Results and Discussion Baseline models showed 28.9% for best-performing supervised models and 32.2% for GPT-4 in F1-metrics. Notes in our dataset have a median word count of 1687. Our analysis of the model outputs showed that our dataset is challenging for both models and human experts, but the models can find meaningful signals from the text.

Conclusion We expect our LCD benchmark to be a resource for the development of advanced supervised models, or prompting methods, tailored for clinical text.

The benchmark dataset is available at https://github.com/Machine-Learning-for-Medical-Language/long-clinical-doc

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

Research reported in this publication was supported by the National Library Of Medicine of the National Institutes of Health under Award Number R01LM012973, and by the National Institute Of Mental Health of the National Institutes of Health under Award Number R01MH126977.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

IRB of Boston Children's Hospital gave ethical approval for this work (IRB number:IRB-P00028617). Under PhysioNet Credentialed Health Data Use Agreement 1.5.0 - Data Use Agreement for the MIMIC-IV (v2.2) and - Data Use Agreement for the MIMIC-IV-Note: Deidentified free-text clinical notes (v2.2) All authors are granted access to the database.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

  • I discovered that the PDF version of our manuscript displayed differs from our original submission. Specifically, Figure 3-(b) is missing, and there are minor missing mathematical expressions on page 9. I suspect these issues arose during the conversion process from our source file (Word) to PDF. To correct this, we will replace the current version with the PDF converted from our PC. In this version, we integrated the Appendix with the main manuscript. The Appendix is shown after the Reference section. The content remains unchanged.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted July 02, 2024.
Download PDF
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
LCD Benchmark: Long Clinical Document Benchmark on Mortality Prediction for Language Models
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
LCD Benchmark: Long Clinical Document Benchmark on Mortality Prediction for Language Models
WonJin Yoon, Shan Chen, Yanjun Gao, Zhanzhan Zhao, Dmitriy Dligach, Danielle S. Bitterman, Majid Afshar, Timothy Miller
medRxiv 2024.03.26.24304920; doi: https://doi.org/10.1101/2024.03.26.24304920
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
LCD Benchmark: Long Clinical Document Benchmark on Mortality Prediction for Language Models
WonJin Yoon, Shan Chen, Yanjun Gao, Zhanzhan Zhao, Dmitriy Dligach, Danielle S. Bitterman, Majid Afshar, Timothy Miller
medRxiv 2024.03.26.24304920; doi: https://doi.org/10.1101/2024.03.26.24304920

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)