Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Retrieval Augmented Generation Enabled Generative Pre-Trained Transformer 4 (GPT-4) Performance for Clinical Trial Screening

Ozan Unlu, Jiyeon Shin, Charlotte J Mailly, Michael F Oates, Michela R Tucci, Matthew Varugheese, Kavishwar Wagholikar, Fei Wang, Benjamin M Scirica, Alexander J Blood, Samuel J Aronson
doi: https://doi.org/10.1101/2024.02.08.24302376
Ozan Unlu
1Accelerator for Clinical Transformation, Brigham and Women’s Hospital, Boston, MA
2Division of Cardiovascular Medicine, Brigham and Women’s Hospital, Boston, MA
3Department of Biomedical Informatics, Harvard Medical School, Boston, MA
6Harvard Medical School, Boston, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jiyeon Shin
1Accelerator for Clinical Transformation, Brigham and Women’s Hospital, Boston, MA
4Mass General Brigham Personalized Medicine, Cambridge, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Charlotte J Mailly
1Accelerator for Clinical Transformation, Brigham and Women’s Hospital, Boston, MA
4Mass General Brigham Personalized Medicine, Cambridge, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael F Oates
1Accelerator for Clinical Transformation, Brigham and Women’s Hospital, Boston, MA
4Mass General Brigham Personalized Medicine, Cambridge, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michela R Tucci
1Accelerator for Clinical Transformation, Brigham and Women’s Hospital, Boston, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Matthew Varugheese
1Accelerator for Clinical Transformation, Brigham and Women’s Hospital, Boston, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kavishwar Wagholikar
1Accelerator for Clinical Transformation, Brigham and Women’s Hospital, Boston, MA
5Research Information Science and Computing, Mass General Brigham, Somerville, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Fei Wang
1Accelerator for Clinical Transformation, Brigham and Women’s Hospital, Boston, MA
4Mass General Brigham Personalized Medicine, Cambridge, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Benjamin M Scirica
1Accelerator for Clinical Transformation, Brigham and Women’s Hospital, Boston, MA
2Division of Cardiovascular Medicine, Brigham and Women’s Hospital, Boston, MA
6Harvard Medical School, Boston, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alexander J Blood
1Accelerator for Clinical Transformation, Brigham and Women’s Hospital, Boston, MA
2Division of Cardiovascular Medicine, Brigham and Women’s Hospital, Boston, MA
6Harvard Medical School, Boston, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: Ablood{at}bwh.harvard.edu
Samuel J Aronson
1Accelerator for Clinical Transformation, Brigham and Women’s Hospital, Boston, MA
4Mass General Brigham Personalized Medicine, Cambridge, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

ABSTRACT

Background Subject screening is a key aspect of all clinical trials; however, traditionally, it is a labor-intensive and error-prone task, demanding significant time and resources. With the advent of large language models (LLMs) and related technologies, a paradigm shift in natural language processing capabilities offers a promising avenue for increasing both quality and efficiency of screening efforts. This study aimed to test the Retrieval-Augmented Generation (RAG) process enabled Generative Pretrained Transformer Version 4 (GPT-4) to accurately identify and report on inclusion and exclusion criteria for a clinical trial.

Methods The Co-Operative Program for Implementation of Optimal Therapy in Heart Failure (COPILOT-HF) trial aims to recruit patients with symptomatic heart failure. As part of the screening process, a list of potentially eligible patients is created through an electronic health record (EHR) query. Currently, structured data in the EHR can only be used to determine 5 out of 6 inclusion and 5 out of 17 exclusion criteria. Trained, but non-licensed, study staff complete manual chart review to determine patient eligibility and record their assessment of the inclusion and exclusion criteria. We obtained the structured assessments completed by the study staff and clinical notes for the past two years and developed a workflow of clinical note-based question answering system powered by RAG architecture and GPT-4 that we named RECTIFIER (RAG-Enabled Clinical Trial Infrastructure for Inclusion Exclusion Review). We used notes from 100 patients as a development dataset, 282 patients as a validation dataset, and 1894 patients as a test set. An expert clinician completed a blinded review of patients’ charts to answer the eligibility questions and determine the “gold standard” answers. We calculated the sensitivity, specificity, accuracy, and Matthews correlation coefficient (MCC) for each question and screening method. We also performed bootstrapping to calculate the confidence intervals for each statistic.

Results Both RECTIFIER and study staff answers closely aligned with the expert clinician answers across criteria with accuracy ranging between 97.9% and 100% (MCC 0.837 and 1) for RECTIFIER and 91.7% and 100% (MCC 0.644 and 1) for study staff. RECTIFIER performed better than study staff to determine the inclusion criteria of “symptomatic heart failure” with an accuracy of 97.9% vs 91.7% and an MCC of 0.924 vs 0.721, respectively. Overall, the sensitivity and specificity of determining eligibility for the RECTIFIER was 92.3% (CI) and 93.9% (CI), and study staff was 90.1% (CI) and 83.6% (CI), respectively.

Conclusion GPT-4 based solutions have the potential to improve efficiency and reduce costs in clinical trial screening. When incorporating new tools such as RECTIFIER, it is important to consider the potential hazards of automating the screening process and set up appropriate mitigation strategies such as final clinician review before patient engagement.

Competing Interest Statement

For this study, complimentary access to Azure OpenAI GPT-4V was provided by Microsoft. Microsoft had no access to the data used and had no involvement in the analysis, interpretation of data, or writing of our study. Samuel J Aronson, Alexander J Blood, Charlotte J Mailly, Michael F Oates, Benjamin M Scirica, Jiyeon Shin Michela R Tucci, Ozan Unlu, Matthew Varugheese, and Fei Wang report Research Grants and related funding via Brigham and Women's Hospital: Better Therapeutics, Boehringer Ingelheim, Eli Lilly, Milestone Pharmaceuticals and NovoNordisk. Samuel J Aronson reports consulting to Nest Genomics. Samuel Aronson, Charlotte J Mailly, Michael F Oates, Michela Tucci and Fei Wang also report unrelated NIH and PCORI support. Alexander J. Blood reports consulting income from Walgreens Health, Color Health, Novo Nordisk, Medscape, and Arsenal Capital Partners, and equity holdings in Knownwell health.Benjamin M Scirica reports consulting fees from Abbvie (DSMB), AstraZeneca (DSMB), Boehringer Ingelheim (DSMB), Better Therapeutics, Elsevier Practice Update Cardiology, Esperion, Hanmi (DSMB), Lexeo (DSMB), and NovoNordisk; and equity in Health [at] Scale. Ozan Unlu receives funding from the National Heart Lung and Blood Institute under award number T32HL007604.

Funding Statement

For this study, complimentary access to Azure OpenAI GPT-4V was provided by Microsoft. This work was conducted with support from Harvard Catalyst, The Harvard Clinical and Translational Science Center (National Center for Advancing Translational Sciences, National Institutes of Health Awards UL1 TR002541 and R01HL151643, and financial contributions from Harvard University and its affiliated academic healthcare centers. Dr. Unlu receives funding from the National Heart Lung and Blood Institute under award number T32HL007604.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Institutional Review Broad of Mass General Brigham gave ethical approval for this work which was performed as part of the COPILOT-HF Study.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

All data produced in the present study are available upon reasonable request to the authors. Any data shared will be in strict compliance with the Health Insurance Portability and Accountability Act (HIPAA) and applicable data use agreements to ensure the protection of privacy and confidentiality of any potentially identifiable information.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted February 08, 2024.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Retrieval Augmented Generation Enabled Generative Pre-Trained Transformer 4 (GPT-4) Performance for Clinical Trial Screening
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Retrieval Augmented Generation Enabled Generative Pre-Trained Transformer 4 (GPT-4) Performance for Clinical Trial Screening
Ozan Unlu, Jiyeon Shin, Charlotte J Mailly, Michael F Oates, Michela R Tucci, Matthew Varugheese, Kavishwar Wagholikar, Fei Wang, Benjamin M Scirica, Alexander J Blood, Samuel J Aronson
medRxiv 2024.02.08.24302376; doi: https://doi.org/10.1101/2024.02.08.24302376
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Retrieval Augmented Generation Enabled Generative Pre-Trained Transformer 4 (GPT-4) Performance for Clinical Trial Screening
Ozan Unlu, Jiyeon Shin, Charlotte J Mailly, Michael F Oates, Michela R Tucci, Matthew Varugheese, Kavishwar Wagholikar, Fei Wang, Benjamin M Scirica, Alexander J Blood, Samuel J Aronson
medRxiv 2024.02.08.24302376; doi: https://doi.org/10.1101/2024.02.08.24302376

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)