Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Evaluating Prognostic Bias of Critical Illness Severity Scores Based on Age, Gender, and Primary Language in the USA: A Retrospective Multicenter Study

Xiaoli Liu, Max Shen, Margaret Lie, Zhongheng Zhang, Deyu Li, Chao Liu, Roger Mark, Zhengbo Zhang, Leo Anthony Celi
doi: https://doi.org/10.1101/2022.08.01.22277736
Xiaoli Liu
1Center for Artificial Intelligence in Medicine, The General Hospital of PLA, Beijing, China
2School of Biological Science and Medical Engineering, Beihang University, Beijing, China
3Laboratory for Computational Physiology, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Max Shen
4Department of Medicine, Beth Israel Deaconess Medical Center, Boston, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Margaret Lie
4Department of Medicine, Beth Israel Deaconess Medical Center, Boston, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zhongheng Zhang
5Department of Emergency Medicine, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Deyu Li
2School of Biological Science and Medical Engineering, Beihang University, Beijing, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Chao Liu
6Department of Critical Care Medicine, The First Medical Center, The General Hospital of PLA, Beijing, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Roger Mark
3Laboratory for Computational Physiology, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zhengbo Zhang
1Center for Artificial Intelligence in Medicine, The General Hospital of PLA, Beijing, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: zhangzhengbo{at}301hospital.com.cn
Leo Anthony Celi
3Laboratory for Computational Physiology, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, United States of America
4Department of Medicine, Beth Israel Deaconess Medical Center, Boston, United States of America
7Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Summary

Background Although severity scoring systems are used to support decision making and assess ICU performance, the likelihood of bias based on age, gender, and primary language has not been studied. We aimed to identify the potential bias of them such as Sequential Organ Failure Assessment (SOFA) and Acute Physiology and Chronic Health Evaluation IVa (APACHE IVa) by evaluating hospital mortality across subgroups divided by age, gender, and primary language via two large intensive care unit (ICU) databases.

Methods This multicenter, retrospective study was conducted using data from the Medical Information Mart for Intensive Care (MIMIC, 2001-2019) database and the electronic ICU Collaborative Research Database (eICU-CRD, 2014-2015). SOFA and APACHE IVa scores were obtained from the first 24 hours of ICU admission. Hospital mortality was the primary outcome. Patients were stratified by age (16-44, 45-64, 64-79, and 80-), gender (female and male), and primary language (English and non-English) then assessed for discrimination and calibration in all subgroups. To evaluate for discrimination, the area under receiver operating characteristic (AUROC) curve and area under precision-recall curve (AUPRC) were used. Standardized mortality ratio (SMR) and calibration belt plot were used to evaluate calibration.

Findings A total of 173,930 patient encounters (78,550 MIMIC and 95,380 eICU-CRD) were studied. Measurements of discrimination performed best for the youngest age ranges and worsened with increasing age (AUROC ranging from 0.812 to 0.673 for SOFA and 0.882 to 0.754 for APACHE IVa, p <0.001). There was a significant difference in discrimination between male and female patients, with female patients performing worse. With MIMIC data, patients whose primary language was not English performed worse than English speaking patients (AUROC ranging 0.771 to 0.709 [p <0.001] for SOFA). Measurements of calibration applied to SOFA showed a statistically significant overestimation of mortality in the youngest patients (SMR 0.55-0.6) and underestimation of mortality in the oldest patients (SMR 1.54-1.57). When using SOFA, mortality is overestimated for male patients (SMR 0.92-0.97) and underestimated for female patients (SMR 1.05-1.11) while mortality is overestimated for English-speaking patients (SMR 0.85) and greatly underestimated for non-English speaking patients (SMR 1.4). In contrast, the calibration applied to APACHE-IVa shows underestimation of mortality for all age groups and genders.

Interpretation The differences in discrimination and calibration with increasing age, female gender, and non-English speaking patients suggest that illness severity scores are prone to bias in their mortality predictions. Caution must be taken when using these illness severity scores for quality benchmarking across ICUs and decision-making for practices among a diverse population.

Funding Z.B.Z was funded by the National Natural Science Foundation of China (62171471).

Evidence before this study We searched PubMed, arXiv, and medRxiv from the inception of the database to July 10, 2022, for articles published without language restrictions. The search terms were (illness severity score OR SOFA OR APACHE-II OR APACHE-IV OR SAPS) AND (evaluation OR performance OR bias) AND ((age OR older OR elderly OR 65 years old OR 80 years old OR subgroup) OR (gender OR Female OR male) OR (language speaking OR English speaking)). Multiple studies have explored the performance among their concerned subgroups with limited patients and hospitals such as over 80, older with sepsis, and surgical patients. Although a small number of studies have presented the performance of scores by age groups, they have not systematically examined the differences and bias between younger and older patients in depth. Few articles analyzed the differences between men and women. No study has discussed the evaluation performance between Non-English and English speakers. We identified that no studies have comprehensively reported the potential bias of clinical scores in the assessment of subgroups classified by age, gender, and English-speaking.

Added value of this study To our best knowledge, we are the first to conduct a systematic bias analysis of the SOFA and APACHE-IVa scores to assess in-hospital outcomes across age (16-44, 45-64, 65-79, and 80-), gender (male and female), and English speaking (Yes and No) subgroups using multicenter data from 189 U.S. hospitals and 173,930 patients episodes. The assessment was performed covering discrimination (AUROC and AUPRC) and calibration (SMR and Calibration belt plot). We found that the AUROCs between the two scores decreased significantly with age. The illness severity exists underestimation for oldest patients and serious overestimation for youngest patients using SOFA score. Both scores demonstrated slightly better AUROCs for males. For Non-English speaking patients, SOFA showed a large reduction in AUROC and very significant underestimation compared to English speakers. Furthermore, there exists higher observed mortality of older patients, females, and Non-English speakers compared to their respective other subgroups using the same SOFA score.

Implications of all the available evidence The aging of the ICU, especially the extremely rapid growth of patients over 80 years old. They exhibit unique characteristics with more comorbidities, frailty, worse prognosis, and the need for more humanistic care, which has evolved into a serious challenge for early clinical triage, diagnosis, and treatment. Females are more likely to withhold pain and not be transferred to the ICU for treatment, which leads to potentially more critical severity illnesses admitted to ICU compared to males. SOFA and APACHE-IVa scores are very important basis and standards for early ICU assessment of illness severity and decision-making. While these general phenomena were noticed in clinical practice of the mentioned subgroups, there is a lack of clear and detailed quantitative analysis of the bias in the use of these scores to protect these vulnerable populations and prevent potential unintentional harm to them. The U.S. is a multicultural and racially integrated country, and the number of Non-English speakers is rising every year which reflects greater socioeconomic and ethnic disparities. Limited communication can also have an impact on patient assessment and treatment. However, the use of the SOFA score for the evaluation of this group of patients has not been reported to date. In this study, we used multicenter data with a large sample size to identify potential bias using the SOFA and APACHE-IVa scores for all mentioned special groups of patients.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

Z.B.Z was funded by the National Natural Science Foundation of China (62171471).

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The de-identification and anonymization were both strictly implemented in the MIMIC and eICU-CRD databases. Our retrospective study was exempted by the ethical review committee of the US.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Footnotes

  • ↵* co-first author

  • author affiliations updated

Data Availability

All data produced in the present study are available upon reasonable request to the authors

https://mimic.mit.edu/

https://eicu-crd.mit.edu/

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
Back to top
PreviousNext
Posted August 09, 2022.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Evaluating Prognostic Bias of Critical Illness Severity Scores Based on Age, Gender, and Primary Language in the USA: A Retrospective Multicenter Study
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Evaluating Prognostic Bias of Critical Illness Severity Scores Based on Age, Gender, and Primary Language in the USA: A Retrospective Multicenter Study
Xiaoli Liu, Max Shen, Margaret Lie, Zhongheng Zhang, Deyu Li, Chao Liu, Roger Mark, Zhengbo Zhang, Leo Anthony Celi
medRxiv 2022.08.01.22277736; doi: https://doi.org/10.1101/2022.08.01.22277736
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Evaluating Prognostic Bias of Critical Illness Severity Scores Based on Age, Gender, and Primary Language in the USA: A Retrospective Multicenter Study
Xiaoli Liu, Max Shen, Margaret Lie, Zhongheng Zhang, Deyu Li, Chao Liu, Roger Mark, Zhengbo Zhang, Leo Anthony Celi
medRxiv 2022.08.01.22277736; doi: https://doi.org/10.1101/2022.08.01.22277736

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)