Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Use of unstructured text in prognostic clinical prediction models: a systematic review

View ORCID ProfileTom M. Seinen, View ORCID ProfileEgill Fridgeirsson, View ORCID ProfileSolomon Ioannou, View ORCID ProfileDaniel Jeannetot, Luis H. John, View ORCID ProfileJan A. Kors, View ORCID ProfileAniek F. Markus, View ORCID ProfileVictor Pera, View ORCID ProfileAlexandros Rekkas, View ORCID ProfileRoss D. Williams, Cynthia Yang, View ORCID ProfileErik van Mulligen, View ORCID ProfilePeter R. Rijnbeek
doi: https://doi.org/10.1101/2022.01.17.22269400
Tom M. Seinen
1Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Tom M. Seinen
  • For correspondence: t.seinen{at}erasmusmc.nl
Egill Fridgeirsson
1Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Egill Fridgeirsson
Solomon Ioannou
1Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Solomon Ioannou
Daniel Jeannetot
1Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Daniel Jeannetot
Luis H. John
1Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jan A. Kors
1Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jan A. Kors
Aniek F. Markus
1Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Aniek F. Markus
Victor Pera
1Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Victor Pera
Alexandros Rekkas
1Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Alexandros Rekkas
Ross D. Williams
1Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ross D. Williams
Cynthia Yang
1Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Erik van Mulligen
1Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Erik van Mulligen
Peter R. Rijnbeek
1Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Peter R. Rijnbeek
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

ABSTRACT

Objective This systematic review aims to assess how information from unstructured clinical text is used to develop and validate prognostic risk prediction models. We summarize the prediction problems and methodological landscape and assess whether using unstructured clinical text data in addition to more commonly used structured data improves the prediction performance.

Materials and Methods We searched Embase, MEDLINE, Web of Science, and Google Scholar to identify studies that developed prognostic risk prediction models using unstructured clinical text data published in the period from January 2005 to March 2021. Data items were extracted, analyzed, and a meta-analysis of the model performance was carried out to assess the added value of text to structured-data models.

Results We identified 126 studies that described 145 clinical prediction problems. Combining text and structured data improved model performance, compared to using only text or only structured data. In these studies, a wide variety of dense and sparse numeric text representations were combined with both deep learning and more traditional machine learning methods. External validation, public availability, and explainability of the developed models was limited.

Conclusion Overall, the use of unstructured clinical text data in the development of prognostic prediction models has been found beneficial in addition to structured data in most studies. The EHR text data is a source of valuable information for prediction model development and should not be neglected. We suggest a future focus on explainability and external validation of the developed models, promoting robust and trustworthy prediction models in clinical practice.

Competing Interest Statement

The authors have declared no competing interest.

Clinical Protocols

https://osf.io/gw628

Funding Statement

This work has received support from the European Health Data and Evidence Network (EHDEN) project. EHDEN has received funding from the Innovative Medicines Initiative 2 Joint Undertaking (JU) under grant agreement No 806968. The JU receives support from the European Union's Horizon 2020 research and innovation programme and EFPIA.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

The data underlying this work are available as supplementary material.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted January 18, 2022.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Use of unstructured text in prognostic clinical prediction models: a systematic review
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Use of unstructured text in prognostic clinical prediction models: a systematic review
Tom M. Seinen, Egill Fridgeirsson, Solomon Ioannou, Daniel Jeannetot, Luis H. John, Jan A. Kors, Aniek F. Markus, Victor Pera, Alexandros Rekkas, Ross D. Williams, Cynthia Yang, Erik van Mulligen, Peter R. Rijnbeek
medRxiv 2022.01.17.22269400; doi: https://doi.org/10.1101/2022.01.17.22269400
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Use of unstructured text in prognostic clinical prediction models: a systematic review
Tom M. Seinen, Egill Fridgeirsson, Solomon Ioannou, Daniel Jeannetot, Luis H. John, Jan A. Kors, Aniek F. Markus, Victor Pera, Alexandros Rekkas, Ross D. Williams, Cynthia Yang, Erik van Mulligen, Peter R. Rijnbeek
medRxiv 2022.01.17.22269400; doi: https://doi.org/10.1101/2022.01.17.22269400

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)