Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Toward Assessing Clinical Trial Publications for Reporting Transparency

View ORCID ProfileHalil Kilicoglu, Graciela Rosemblat, Linh Hoang, Sahil Wadhwa, Zeshan Peng, View ORCID ProfileMario Malički, Jodi Schneider, View ORCID ProfileGerben ter Riet
doi: https://doi.org/10.1101/2021.01.12.21249695
Halil Kilicoglu
aSchool of Information Sciences, University of Illinois at Urbana-Champaign, Champaign, IL, USA
bU.S. National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Halil Kilicoglu
  • For correspondence: halil{at}illinois.edu
Graciela Rosemblat
bU.S. National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Linh Hoang
aSchool of Information Sciences, University of Illinois at Urbana-Champaign, Champaign, IL, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sahil Wadhwa
cDepartment of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zeshan Peng
bU.S. National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mario Malički
dMeta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Mario Malički
Jodi Schneider
aSchool of Information Sciences, University of Illinois at Urbana-Champaign, Champaign, IL, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gerben ter Riet
eUrban Vitality Center of Expertise, Faculty of Health, Amsterdam University of Applied Sciences, Amsterdam, the Netherlands
fDepartment of Cardiology Heart Center, Amsterdam UMC, University of Amsterdam, the Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Gerben ter Riet
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Objective To annotate a corpus of randomized controlled trial (RCT) publications with the checklist items of CONSORT reporting guidelines and using the corpus to develop text mining methods for RCT appraisal.

Methods We annotated a corpus of 50 RCT articles at the sentence level using 37 fine-grained CONSORT checklist items. A subset (31 articles) was double-annotated and adjudicated, while 19 were annotated by a single annotator and reconciled by another. We calculated inter-annotator agreement at the article and section level using MASI (Measuring Agreement on Set-Valued Items) and at the CONSORT item level using Krippendorff’s α. We experimented with two rule-based methods (phrase-based and section header-based) and two supervised learning approaches (support vector machine and BioBERT-based neural network classifiers), for recognizing 17 methodology-related items in the RCT Methods sections.

Results We created CONSORT-TM consisting of 10,709 sentences, 4,845 (45%) of which were annotated with 5,246 labels. A median of 28 CONSORT items (out of possible 37) were annotated per article. Agreement was moderate at the article and section levels (average MASI: 0.60 and 0.64, respectively). Agreement varied considerably among individual checklist items (Krippendorff’s α= 0.06-0.96). The model based on BioBERT performed best overall for recognizing methodology-related items (micro-precision: 0.82, micro-recall: 0.63, micro-F1: 0.71). Combining models using majority vote and label aggregation further improved precision and recall, respectively.

Conclusion Our annotated corpus, CONSORT-TM, contains more fine-grained information than earlier RCT corpora. Low frequency of some CONSORT items made it difficult to train effective text mining models to recognize them. For the items commonly reported, CONSORT-TM can serve as a testbed for text mining methods that assess RCT transparency, rigor, and reliability, and support methods for peer review and authoring assistance. Minor modifications to the annotation scheme and a larger corpus could facilitate improved text mining models. CONSORT-TM is publicly available at https://github.com/kilicogluh/CONSORT-TM.

Figure
  • Download figure
  • Open in new tab

Highlights

  • We constructed a corpus of RCT publications annotated with CONSORT checklist items.

  • We developed text mining methods to identify methodology-related check-list items.

  • A BioBERT-based model performs best in recognizing adequately reported items.

  • A phrase-based method performs best in recognizing infrequently reported items.

  • The corpus and the text mining methods can be used to address reporting transparency.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This work was supported in part by the intramural research program at the U.S. National Library of Medicine, National Institutes of Health.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Not a human study.

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

CONSORT-TM is publicly available at https://github.com/kilicogluh/CONSORT-TM.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.
Back to top
PreviousNext
Posted January 15, 2021.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Toward Assessing Clinical Trial Publications for Reporting Transparency
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Toward Assessing Clinical Trial Publications for Reporting Transparency
Halil Kilicoglu, Graciela Rosemblat, Linh Hoang, Sahil Wadhwa, Zeshan Peng, Mario Malički, Jodi Schneider, Gerben ter Riet
medRxiv 2021.01.12.21249695; doi: https://doi.org/10.1101/2021.01.12.21249695
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Toward Assessing Clinical Trial Publications for Reporting Transparency
Halil Kilicoglu, Graciela Rosemblat, Linh Hoang, Sahil Wadhwa, Zeshan Peng, Mario Malički, Jodi Schneider, Gerben ter Riet
medRxiv 2021.01.12.21249695; doi: https://doi.org/10.1101/2021.01.12.21249695

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)