Exploring the potential of Claude 2 for risk of bias assessment: Using a large language model to assess randomized controlled trials with RoB 2

Angelika Eisele-Metzger; Judith-Lisa Lieberum; Markus Toews; Waldemar Siemens; Felix Heilmeyer; Christian Haverkamp; Daniel Boehringer; Joerg J Meerpohl

doi:10.1101/2024.07.16.24310483

ABSTRACT

Systematic reviews are essential for evidence based healthcare, but conducting them is time and resource consuming. To date, efforts have been made to accelerate and (semi-) automate various steps of systematic reviews through the use of artificial intelligence and the emergence of large language models (LLMs) promises further opportunities. One crucial but complex task within systematic review conduct is assessing the risk of bias of included studies. Therefore, the aim of this study was to test the LLM Claude 2 for risk of bias assessment of 100 randomized controlled trials using the revised Cochrane risk of bias tool (“RoB 2”; involving judgements for five specific domains and an overall judgement). We assessed the agreement of risk of bias judgements by Claude with human judgements published in Cochrane Reviews. The observed agreement between Claude and Cochrane authors ranged from 41% for the overall judgement to 71% for domain 4 (“outcome measurement”). Cohen’s κ was lowest for domain 5 (“selective reporting”; 0.10 (95% confidence interval (CI): −0.10-0.31)) and highest for domain 3 (“missing data”; 0.31 (95% CI: 0.10-0.52)), indicating slight to fair agreement. Fair agreement was found for the overall judgement (Cohen’s κ: 0.22 (95% CI: 0.06-0.38)). Sensitivity analyses using alternative prompting techniques or the more recent version Claude 3 did not result in substantial changes. Currently, Claude’s RoB 2 judgements cannot replace human risk of bias assessment. However, the potential of LLMs to support risk of bias assessment should be further explored.

Competing Interest Statement

The authors have declared no competing interest.

Clinical Protocols

https://osf.io/42dnb

Funding Statement

This work was supported by the Research Commission at the Faculty of Medicine, University of Freiburg, Freiburg, Germany (grant no. EIS2244/23).

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

DATA AVAILABILITY STATEMENT

Prompt templates, the R code used for analysis, model responses, extracted data and the full sample of Cochrane Reviews assessed for eligibility are stored at open science framework (OSF) and can be accessed via the following link: https://osf.io/2phyt. The source code and documentation for our custom program (Patchbay) are available at https://github.com/daboe01/LLMPatchbay.