Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Evaluating Text-to-Image Generated Photorealistic Images of Human Anatomy

Paula Muhr, Yating Pan, Charlotte Tumescheit, Ann-Kathrin Kübler, Hatice Kübra Parmaksiz, Cheng Chen, Pablo Sebastián Bolaños Orozco, Soeren S. Lienkamp, View ORCID ProfileJanna Hastings
doi: https://doi.org/10.1101/2024.08.21.24312353
Paula Muhr
1Institute for Implementation Science in Health Care, Faculty of Medicine, University of Zurich, Switzerland
2Social Studies of Science and Technology, Technische Universität Berlin, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yating Pan
3Digital Society Initiative, University of Zurich, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Charlotte Tumescheit
1Institute for Implementation Science in Health Care, Faculty of Medicine, University of Zurich, Switzerland
6Swiss Institute of Bioinformatics, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ann-Kathrin Kübler
3Digital Society Initiative, University of Zurich, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hatice Kübra Parmaksiz
3Digital Society Initiative, University of Zurich, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Cheng Chen
3Digital Society Initiative, University of Zurich, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Pablo Sebastián Bolaños Orozco
3Digital Society Initiative, University of Zurich, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Soeren S. Lienkamp
4Institute for Anatomy, Faculty of Medicine, University of Zurich, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Janna Hastings
1Institute for Implementation Science in Health Care, Faculty of Medicine, University of Zurich, Switzerland
5School of Medicine, University of St. Gallen, Switzerland
6Swiss Institute of Bioinformatics, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Janna Hastings
  • For correspondence: janna.hastings{at}uzh.ch
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Background Generative AI models that can produce photorealistic images from text descriptions have many applications in medicine, including medical education and synthetic data. However, it can be challenging to evaluate and compare their range of heterogeneous outputs, and thus there is a need for a systematic approach enabling image and model comparisons.

Methods We develop an error classification system for annotating errors in AI-generated photorealistic images of humans and apply our method to a corpus of 240 images generated with three different models (DALL-E 3, Stable Diffusion XL and Stable Cascade) using 10 prompts with 8 images per prompt. The error classification system identifies five different error types with three different severities across five anatomical regions and specifies an associated quantitative scoring method based on aggregated proportions of errors per expected count of anatomical components for the generated image. We assess inter-rater agreement by double-annotating 25% of the images and calculating Krippendorf’s alpha and compare results across the three models and ten prompts quantitatively using a cumulative score per image.

Findings The error classification system, accompanying training manual, generated image collection, annotations, and all associated scripts are available from our GitHub repository at https://github.com/hastingslab-org/ai-human-images. Inter-rater agreement was relatively poor, reflecting the subjectivity of the error classification task. Model comparisons revealed DALL-E 3 performed consistently better than Stable Diffusion, however, the latter generated images reflecting more diversity in personal attributes. Images with groups of people were more challenging for all the models than individuals or pairs; some prompts were challenging for all models.

Interpretation Our method enables systematic comparison of AI-generated photorealistic images of humans; our results can serve to catalyse improvements in these models for medical applications.

Funding This study received support from the University of Zurich’s Digital Society Initiative, and the Swiss National Science Foundation under grant agreement 209510.

Evidence before this study The authors searched PubMed and Google Scholar to find publications evaluating text-to-image model outputs for medical applications between 2014 (when generative adversarial networks first become available) and 2024. While the bulk of evaluations focused on task-specific networks generating single types of medical image, a few evaluations emerged exploring the novel general-purpose text-to-image diffusion models more broadly for applications in medical education and synthetic data generation. However, no previous work attempts to develop a systematic approach to evaluate these models’ representations of human anatomy.

Added value of this study We present an anatomical error classification system, the first systematic approach to evaluate AI-generated images of humans that enables model and prompt comparisons. We apply our method to a corpus of generated images to compare state of the art large-scale models DALL-E 3 and two models from the Stable Diffusion family.

Implications of all the available evidence While our approach enables systematic comparisons, it remains limited by subjectivity and is labour-intensive for images with many represented figures. Future research should explore automation of some aspects of the evaluation through coupled segmentation and classification models.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study received support from the University of Zurich's Digital Society Initiative, and the Swiss National Science Foundation under grant agreement 209510.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

All data provided in the manuscript are available online at https://github.com/hastingslab-org/ai-human-images.

https://github.com/hastingslab-org/ai-human-images

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted August 21, 2024.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Evaluating Text-to-Image Generated Photorealistic Images of Human Anatomy
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Evaluating Text-to-Image Generated Photorealistic Images of Human Anatomy
Paula Muhr, Yating Pan, Charlotte Tumescheit, Ann-Kathrin Kübler, Hatice Kübra Parmaksiz, Cheng Chen, Pablo Sebastián Bolaños Orozco, Soeren S. Lienkamp, Janna Hastings
medRxiv 2024.08.21.24312353; doi: https://doi.org/10.1101/2024.08.21.24312353
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Evaluating Text-to-Image Generated Photorealistic Images of Human Anatomy
Paula Muhr, Yating Pan, Charlotte Tumescheit, Ann-Kathrin Kübler, Hatice Kübra Parmaksiz, Cheng Chen, Pablo Sebastián Bolaños Orozco, Soeren S. Lienkamp, Janna Hastings
medRxiv 2024.08.21.24312353; doi: https://doi.org/10.1101/2024.08.21.24312353

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)