Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Comparing human vs. machine-assisted analysis to develop a new approach for Big Qualitative Data Analysis

Sam Martin, View ORCID ProfileEmma Beecham, Emira Kursumovic, Richard A. Armstrong, Tim M. Cook, Noémie Déom, Andrew D. Kane, Sophie Moniz, Jasmeet Soar, Cecilia Vindrola-Padros, collaborators
doi: https://doi.org/10.1101/2024.07.16.24310275
Sam Martin
1Rapid Research Evaluation and Appraisal Lab (RREAL), Department of Targeted Intervention, University College London (UCL), London, UK; Oxford Vaccine Group, Department of Paediatrics, Oxford University, Oxford, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Emma Beecham
2Rapid Research Evaluation and Appraisal Lab (RREAL), Department of Targeted Intervention, University College London (UCL), London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Emma Beecham
  • For correspondence: e.beecham{at}ucl.ac.uk
Emira Kursumovic
3Department of Anaesthesia, Royal United Hospitals Bath NHS Foundation Trust, Bath, UK; Health Services Research Centre, Royal College of Anaesthetists, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Richard A. Armstrong
4Department of Anaesthesia, Severn Deanery, Bristol, UK; Health Services Research Centre, Royal College of Anaesthetists, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tim M. Cook
5Department of Anaesthesia, Royal United Hospital, Bath; University of Bristol School of Medicine, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Noémie Déom
2Rapid Research Evaluation and Appraisal Lab (RREAL), Department of Targeted Intervention, University College London (UCL), London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andrew D. Kane
6Department of Anaesthesia, James Cook University Hospital, South Tees NHS Foundation Trust, Middlesbrough, UK; Health Services Research Centre, Royal College of Anaesthetists, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sophie Moniz
2Rapid Research Evaluation and Appraisal Lab (RREAL), Department of Targeted Intervention, University College London (UCL), London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jasmeet Soar
7Department of Anaesthesia, Southmead Hospital, North Bristol NHS Trust, Bristol, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Cecilia Vindrola-Padros
2Rapid Research Evaluation and Appraisal Lab (RREAL), Department of Targeted Intervention, University College London (UCL), London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Background Analysing large qualitative datasets can present significant challenges, including the time and resources required for manual analysis and the potential for missing nuanced insights. This paper aims to address these challenges by exploring the application of Big Qualitative (Big Qual) and artificial intelligence (AI) methods to efficiently analyse Big Qual data while retaining the depth and complexity of human understanding. The free-text responses from the Royal College of Anaesthetists’ 7th National Audit Project (NAP7) baseline survey on peri-operative cardiac arrest experiences serve as a case study to test and validate this approach.

Methodology/Principal Findings Quantitative analysis segmented the data and identified keywords using AI methods. In-depth sentiment and thematic analysis combined natural language processing (NLP) and machine learning (ML) with human input - researchers assigned topic/theme labels and sentiments to responses, while discourse analysis explored sub-topics and thematic diversity. Human annotation refined the machine-generated sentiments, leading to an additional “ambiguous” category to capture nuanced, mixed responses. Comparative analysis was used to evaluate the concordance between human and machine-assisted sentiment labelling. While ML reduced analysis time significantly, human input was crucial for refining sentiment categories and capturing nuances.

Conclusions/Significance The application of AI-assisted data analysis tools, combined with human expertise, offers a powerful approach to efficiently analyse large-scale qualitative datasets while preserving the nuance and complexity of the data. This study demonstrates the potential of this novel methodology to streamline the analysis process, reduce resource requirements, and generate meaningful insights from Big Qual data. The integration of NLP, ML, and human input allows for a more comprehensive understanding of the themes, sentiments, and experiences captured in free-text responses. This study underscores the importance of continued interdisciplinary collaboration among domain experts, data scientists, and AI specialists to optimise these methods, ensuring their reliability, validity, and ethical application in real-world contexts.

Author Summary The use of Artificial intelligence (AI) in health research has grown over recent years. However, analysis of large qualitative datasets known as Big Qualitative Data, in public health using AI, is a relatively new area of research. Here, we use novel techniques of machine learning and natural language processing where computers learn how to handle and interpret human language, to analyse a large national survey. The Royal College of Anaesthetists’ 7th National Audit Project is a large UK-wide initiative examining peri- operative cardiac arrest. We use the free-text data from this survey to test and validate our novel methods and compare analysing the data by hand (human) vs. human-machine learning also known as ‘machine-assisted’ analysis. Using two AI tools to conduct the analysis we found that the machine- assisted analysis significantly reduced the time to analyse the dataset. Extra human input, however, was required to provide topic expertise and nuance to the analysis. The AI tools reduced the sentiment analysis to positive, negative or neutral, but the human input introduced a fourth ‘ambiguous’ category. The insights gained from this approach present ways that AI can help inform targeted interventions and quality improvement initiatives to enhance patient safety, in this case, in peri-operative cardiac arrest management.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

The project infrastructure was supported financially and with staffing from the Royal College of Anaesthetists. The NAP7 fellows salaries were supported by: South Tees Hospitals NHS Foundation Trust (AK); Royal United Hospitals Bath NHS Foundation Trust (EK); NIHR Academic Clinical Fellowship (RA). JS and TCs employers receive backfill for their time on the project (4 hours per week). NAP7 panel members were not paid for their role. EB SM and CVP were supported by the NIHR Central London Patient Safety Research Collaboration (CL PSRC) reference number NIHR204297.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

All parts of the NAP7 are classified as a service evaluation as there is no intervention no randomisation of patients and no change to standard patient care or treatment. The project is observational and does not require research ethics committee approval in line with the Health Research Agencys decision tools.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

All data for our study is based on secondary data analysis of the data shared in the data available in online Supporting Information from the following publication Kane et al (2022) Methods of the 7th National Audit Project (NAP7) of the Royal College of Anaesthetists: peri-operative cardiac arrest. Anaesthesia 77: 1376-1385.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted July 17, 2024.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Comparing human vs. machine-assisted analysis to develop a new approach for Big Qualitative Data Analysis
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Comparing human vs. machine-assisted analysis to develop a new approach for Big Qualitative Data Analysis
Sam Martin, Emma Beecham, Emira Kursumovic, Richard A. Armstrong, Tim M. Cook, Noémie Déom, Andrew D. Kane, Sophie Moniz, Jasmeet Soar, Cecilia Vindrola-Padros, collaborators
medRxiv 2024.07.16.24310275; doi: https://doi.org/10.1101/2024.07.16.24310275
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Comparing human vs. machine-assisted analysis to develop a new approach for Big Qualitative Data Analysis
Sam Martin, Emma Beecham, Emira Kursumovic, Richard A. Armstrong, Tim M. Cook, Noémie Déom, Andrew D. Kane, Sophie Moniz, Jasmeet Soar, Cecilia Vindrola-Padros, collaborators
medRxiv 2024.07.16.24310275; doi: https://doi.org/10.1101/2024.07.16.24310275

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Public and Global Health
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)