Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Classifying progression status statements from radiology exams among non-small cell lung cancer patients using natural language processing

View ORCID ProfileAnahita Davoudi, View ORCID ProfileShun Yu, View ORCID ProfileAbigail Doucette, View ORCID ProfilePeter Gabriel, View ORCID ProfileMark Miller, View ORCID ProfileHeather Williams, View ORCID ProfileHeena Desai, View ORCID ProfileAnh Le, View ORCID ProfileChristian J Stoeckert, View ORCID ProfileKara Maxwell, View ORCID ProfileDanielle L. Mowery
doi: https://doi.org/10.1101/2021.11.20.21266642
Anahita Davoudi
1Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Anahita Davoudi
Shun Yu
2New York University Langone Health, New York, NY, USA
3Division of Hematology/Oncology, Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Shun Yu
Abigail Doucette
4Department of Radiation Oncology, University of Pennsylvania, Philadelphia, PA, USA
MP
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Abigail Doucette
Peter Gabriel
4Department of Radiation Oncology, University of Pennsylvania, Philadelphia, PA, USA
5Abramson Cancer Center, University of Pennsylvania, Philadelphia, PA, USA
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Peter Gabriel
Mark Miller
1Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA
7Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
MS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Mark Miller
Heather Williams
1Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA
7Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
MS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Heather Williams
Heena Desai
3Division of Hematology/Oncology, Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
MS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Heena Desai
Anh Le
3Division of Hematology/Oncology, Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
BS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Anh Le
Christian J Stoeckert
6Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
7Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Christian J Stoeckert
Kara Maxwell
3Division of Hematology/Oncology, Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
5Abramson Cancer Center, University of Pennsylvania, Philadelphia, PA, USA
PhD MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kara Maxwell
Danielle L. Mowery
1Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA
5Abramson Cancer Center, University of Pennsylvania, Philadelphia, PA, USA
6Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Danielle L. Mowery
  • For correspondence: dlmowery{at}pennmedicine.upenn.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Although NLP has been used to support cancer research more broadly, the development of NLP algorithms to extract evidence of progression from clinical notes to support lung cancer research is still in its infancy. In this study, we trained supervised machine learning classifiers using rich semantic features to detect and classify statements of progression status from radiology exams. Our progression status classifier achieves high F1-scores for detecting and discerning progression (0.80), stable (0.82), and not relevant (0.92) sentences, demonstrating promising performance. We are actively integrating these extractions with structured electronic health record data using ontologies to instantiate a longitudinal model of progression among non-small cell lung cancer patients.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

We extend our gratitude to Abramson Cancer Center and the Institute for Biomedical Informatics for the emerging Cancer Informatics Center of Excellence (eCICE) grant that supported this work.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The Institute Review Board of the University of Pennsylvania gave ethical approval for this work (#834430).

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Footnotes

  • Financial Support: We express our deepest gratitude to Abramson Cancer Center and the Institute for Biomedical Informatics for the emerging Cancer Informatics Center of Excellence award that funded this important work.

Data Availability

The classifiers produced in the present study will be openly available.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted November 21, 2021.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Classifying progression status statements from radiology exams among non-small cell lung cancer patients using natural language processing
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Classifying progression status statements from radiology exams among non-small cell lung cancer patients using natural language processing
Anahita Davoudi, Shun Yu, Abigail Doucette, Peter Gabriel, Mark Miller, Heather Williams, Heena Desai, Anh Le, Christian J Stoeckert, Kara Maxwell, Danielle L. Mowery
medRxiv 2021.11.20.21266642; doi: https://doi.org/10.1101/2021.11.20.21266642
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Classifying progression status statements from radiology exams among non-small cell lung cancer patients using natural language processing
Anahita Davoudi, Shun Yu, Abigail Doucette, Peter Gabriel, Mark Miller, Heather Williams, Heena Desai, Anh Le, Christian J Stoeckert, Kara Maxwell, Danielle L. Mowery
medRxiv 2021.11.20.21266642; doi: https://doi.org/10.1101/2021.11.20.21266642

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)