Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Scalable Incident Detection via Natural Language Processing and Probabilistic Language Models

View ORCID ProfileColin G. Walsh, Drew Wilimitis, Qingxia Chen, Aileen Wright, Jhansi Kolli, Katelyn Robinson, Michael A. Ripperger, Kevin B. Johnson, David Carrell, Rishi J. Desai, Andrew Mosholder, Sai Dharmarajan, Sruthi Adimadhyam, Daniel Fabbri, Danijela Stojanovic, Michael E. Matheny, Cosmin A. Bejan
doi: https://doi.org/10.1101/2023.11.30.23299249
Colin G. Walsh
1Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
2Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
3Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Colin G. Walsh
  • For correspondence: colin.walsh{at}vumc.org
Drew Wilimitis
1Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Qingxia Chen
1Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
2Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Aileen Wright
1Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jhansi Kolli
1Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Katelyn Robinson
1Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael A. Ripperger
1Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kevin B. Johnson
6Department of Biostatistics, Epidemiology and Informatics, and Pediatrics, University of Pennsylvania
7Department of Computer and Information Science, Bioengineering, University of Pennsylvania
8Department of Science Communication, University of Pennsylvania
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
David Carrell
9Washington Health Research Institute, Kaiser Permanente Washington
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rishi J. Desai
10Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital/Harvard Medical School
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andrew Mosholder
4Center for Drug Evaluation and Research, Food and Drug Administration
5Office of Surveillance and Epidemiology, Food and Drug Administration
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sai Dharmarajan
4Center for Drug Evaluation and Research, Food and Drug Administration
12Office of Translational Science, Food and Drug Administration
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sruthi Adimadhyam
11Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Daniel Fabbri
1Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Danijela Stojanovic
4Center for Drug Evaluation and Research, Food and Drug Administration
5Office of Surveillance and Epidemiology, Food and Drug Administration
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael E. Matheny
1Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Cosmin A. Bejan
1Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Post marketing safety surveillance depends in part on the ability to detect concerning clinical events at scale. Spontaneous reporting might be an effective component of safety surveillance, but it requires awareness and understanding among healthcare professionals to achieve its potential. Reliance on readily available structured data such as diagnostic codes risk under-coding and imprecision. Clinical textual data might bridge these gaps, and natural language processing (NLP) has been shown to aid in scalable phenotyping across healthcare records in multiple clinical domains. In this study, we developed and validated a novel incident phenotyping approach using unstructured clinical textual data agnostic to Electronic Health Record (EHR) and note type. It’s based on a published, validated approach (PheRe) used to ascertain social determinants of health and suicidality across entire healthcare records. To demonstrate generalizability, we validated this approach on two separate phenotypes that share common challenges with respect to accurate ascertainment: 1) suicide attempt; 2) sleep-related behaviors. With samples of 89,428 records and 35,863 records for suicide attempt and sleep-related behaviors, respectively, we conducted silver standard (diagnostic coding) and gold standard (manual chart review) validation. We showed Area Under the Precision-Recall Curve of ∼ 0.77 (95% CI 0.75-0.78) for suicide attempt and AUPR ∼ 0.31 (95% CI 0.28-0.34) for sleep-related behaviors. We also evaluated performance by coded race and demonstrated differences in performance by race were dissimilar across phenotypes and require algorithmovigilance and debiasing prior to implementation.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study was funded by the U.S. Food and Drug Administration's Sentinel Initiative. All investigators were supported on FDA WO2006. Dr. Walsh is also supported in part by NIMH R01MH121455 and R01MH116269.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The IRB of Vanderbilt University Medical Center gave ethical approval for this work.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

Because data include sensitive PHI, data are not available for dissemination outside the study team.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted December 01, 2023.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Scalable Incident Detection via Natural Language Processing and Probabilistic Language Models
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Scalable Incident Detection via Natural Language Processing and Probabilistic Language Models
Colin G. Walsh, Drew Wilimitis, Qingxia Chen, Aileen Wright, Jhansi Kolli, Katelyn Robinson, Michael A. Ripperger, Kevin B. Johnson, David Carrell, Rishi J. Desai, Andrew Mosholder, Sai Dharmarajan, Sruthi Adimadhyam, Daniel Fabbri, Danijela Stojanovic, Michael E. Matheny, Cosmin A. Bejan
medRxiv 2023.11.30.23299249; doi: https://doi.org/10.1101/2023.11.30.23299249
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Scalable Incident Detection via Natural Language Processing and Probabilistic Language Models
Colin G. Walsh, Drew Wilimitis, Qingxia Chen, Aileen Wright, Jhansi Kolli, Katelyn Robinson, Michael A. Ripperger, Kevin B. Johnson, David Carrell, Rishi J. Desai, Andrew Mosholder, Sai Dharmarajan, Sruthi Adimadhyam, Daniel Fabbri, Danijela Stojanovic, Michael E. Matheny, Cosmin A. Bejan
medRxiv 2023.11.30.23299249; doi: https://doi.org/10.1101/2023.11.30.23299249

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)