Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Exploring the potential of Claude 2 for risk of bias assessment: Using a large language model to assess randomized controlled trials with RoB 2

View ORCID ProfileAngelika Eisele-Metzger, View ORCID ProfileJudith-Lisa Lieberum, View ORCID ProfileMarkus Toews, View ORCID ProfileWaldemar Siemens, View ORCID ProfileFelix Heilmeyer, Christian Haverkamp, View ORCID ProfileDaniel Boehringer, View ORCID ProfileJoerg J Meerpohl
doi: https://doi.org/10.1101/2024.07.16.24310483
Angelika Eisele-Metzger
1Institute for Evidence in Medicine, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg, Germany
2Cochrane Germany, Cochrane Germany Foundation, Freiburg, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Angelika Eisele-Metzger
  • For correspondence: angelika.eisele-metzger{at}uniklinik-freiburg.de
Judith-Lisa Lieberum
3Eye Center, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Judith-Lisa Lieberum
Markus Toews
1Institute for Evidence in Medicine, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Markus Toews
Waldemar Siemens
1Institute for Evidence in Medicine, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Waldemar Siemens
Felix Heilmeyer
4Institute for Digitalization in Medicine, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Felix Heilmeyer
Christian Haverkamp
4Institute for Digitalization in Medicine, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Daniel Boehringer
3Eye Center, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Daniel Boehringer
Joerg J Meerpohl
1Institute for Evidence in Medicine, Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg, Germany
2Cochrane Germany, Cochrane Germany Foundation, Freiburg, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Joerg J Meerpohl
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

ABSTRACT

Systematic reviews are essential for evidence based healthcare, but conducting them is time and resource consuming. To date, efforts have been made to accelerate and (semi-) automate various steps of systematic reviews through the use of artificial intelligence and the emergence of large language models (LLMs) promises further opportunities. One crucial but complex task within systematic review conduct is assessing the risk of bias of included studies. Therefore, the aim of this study was to test the LLM Claude 2 for risk of bias assessment of 100 randomized controlled trials using the revised Cochrane risk of bias tool (“RoB 2”; involving judgements for five specific domains and an overall judgement). We assessed the agreement of risk of bias judgements by Claude with human judgements published in Cochrane Reviews. The observed agreement between Claude and Cochrane authors ranged from 41% for the overall judgement to 71% for domain 4 (“outcome measurement”). Cohen’s κ was lowest for domain 5 (“selective reporting”; 0.10 (95% confidence interval (CI): −0.10-0.31)) and highest for domain 3 (“missing data”; 0.31 (95% CI: 0.10-0.52)), indicating slight to fair agreement. Fair agreement was found for the overall judgement (Cohen’s κ: 0.22 (95% CI: 0.06-0.38)). Sensitivity analyses using alternative prompting techniques or the more recent version Claude 3 did not result in substantial changes. Currently, Claude’s RoB 2 judgements cannot replace human risk of bias assessment. However, the potential of LLMs to support risk of bias assessment should be further explored.

Competing Interest Statement

The authors have declared no competing interest.

Clinical Protocols

https://osf.io/42dnb

Funding Statement

This work was supported by the Research Commission at the Faculty of Medicine, University of Freiburg, Freiburg, Germany (grant no. EIS2244/23).

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

DATA AVAILABILITY STATEMENT

Prompt templates, the R code used for analysis, model responses, extracted data and the full sample of Cochrane Reviews assessed for eligibility are stored at open science framework (OSF) and can be accessed via the following link: https://osf.io/2phyt. The source code and documentation for our custom program (Patchbay) are available at https://github.com/daboe01/LLMPatchbay.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted July 16, 2024.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Exploring the potential of Claude 2 for risk of bias assessment: Using a large language model to assess randomized controlled trials with RoB 2
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Exploring the potential of Claude 2 for risk of bias assessment: Using a large language model to assess randomized controlled trials with RoB 2
Angelika Eisele-Metzger, Judith-Lisa Lieberum, Markus Toews, Waldemar Siemens, Felix Heilmeyer, Christian Haverkamp, Daniel Boehringer, Joerg J Meerpohl
medRxiv 2024.07.16.24310483; doi: https://doi.org/10.1101/2024.07.16.24310483
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Exploring the potential of Claude 2 for risk of bias assessment: Using a large language model to assess randomized controlled trials with RoB 2
Angelika Eisele-Metzger, Judith-Lisa Lieberum, Markus Toews, Waldemar Siemens, Felix Heilmeyer, Christian Haverkamp, Daniel Boehringer, Joerg J Meerpohl
medRxiv 2024.07.16.24310483; doi: https://doi.org/10.1101/2024.07.16.24310483

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)