Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Measuring the missing: greater racial and ethnic disparities in COVID-19 burden after accounting for missing race/ethnicity data

Katie Labgold, Sarah Hamid, Sarita Shah, Neel R. Gandhi, Allison Chamberlain, Fazle Khan, Shamimul Khan, Sasha Smith, Steve Williams, Timothy L. Lash, Lindsay J. Collin
doi: https://doi.org/10.1101/2020.09.30.20203315
Katie Labgold
1Department of Epidemiology, Rollins School of Public Health, Emory University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sarah Hamid
1Department of Epidemiology, Rollins School of Public Health, Emory University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sarita Shah
1Department of Epidemiology, Rollins School of Public Health, Emory University
2Department of Global Health, Rollins School of Public Health, Emory University
3Division of Infectious Diseases, Emory School of Medicine, Emory University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Neel R. Gandhi
1Department of Epidemiology, Rollins School of Public Health, Emory University
2Department of Global Health, Rollins School of Public Health, Emory University
3Division of Infectious Diseases, Emory School of Medicine, Emory University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Allison Chamberlain
1Department of Epidemiology, Rollins School of Public Health, Emory University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Fazle Khan
4Fulton County Board of Health
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shamimul Khan
4Fulton County Board of Health
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sasha Smith
4Fulton County Board of Health
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Steve Williams
4Fulton County Board of Health
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Timothy L. Lash
1Department of Epidemiology, Rollins School of Public Health, Emory University
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lindsay J. Collin
1Department of Epidemiology, Rollins School of Public Health, Emory University
5Department of Population Health Sciences, Huntsman Cancer Institute, University of Utah
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: lindsay.collin{at}hci.utah.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Black, Hispanic, and Indigenous persons in the United States have an increased risk of SARS-CoV-2 infection and death from COVID-19, due to persistent social inequities. The magnitude of the disparity is unclear, however, because race/ethnicity information is often missing in surveillance data. In this study, we quantified the burden of SARS-CoV-2 infection, hospitalization, and case fatality rates in an urban county by racial/ethnic group using combined race/ethnicity imputation and quantitative bias-adjustment for misclassification. After bias-adjustment, the magnitude of the absolute racial/ethnic disparity, measured as the difference in infection rates between classified Black and Hispanic persons compared to classified White persons, increased 1.3-fold and 1.6-fold respectively. These results highlight that complete case analyses may underestimate absolute disparities in infection rates. Collecting race/ethnicity information at time of testing is optimal. However, when data are missing, combined imputation and bias-adjustment improves estimates of the racial/ethnic disparities in the COVID-19 burden.

Introduction

In the United States, early surveillance reports highlight that persons of Hispanic, Black, and American Indian/Alaskan Native race and ethnicity are disproportionately affected by the COVID-19 pandemic.1 These disparities arise from historical and contemporary social and health inequities that result from systemic racism.2–4 Racial capitalism in particular produces structurally unequal exposure to (and protection from) SARS-CoV-2 infection in key places of transmission (e.g. workplace).3

The role of systemic racism in the pandemic motivates the need for accurate surveillance of racial/ethnic disparities in SARS-CoV-2 infection and death. However, there are challenges in estimating COVID-19 racial/ethnic disparities.5,6 Although reports highlight the unequal burden across racial/ethnic groups, the magnitude of disparities is uncertain because of the large proportion of missing race/ethnicity information in surveillance data. In recent reports, race/ethnicity information was missing in 56% of confirmed infections nationally and in 36% in Georgia.7,8 Current surveillance estimates are reported as complete case analyses, which exclude cases with missing race/ethnicity.1,5,8,9 Complete case analyses will bias racial/ethnic disparity estimates if race/ethnicity information is not missing completely at random.10

The Department of Health and Human Services issued COVID-19 reporting guidelines in June requiring all labs to report race/ethnicity beginning August 2020.11 These guidelines seek to address missing data moving forward, but fail to address missing information for case-patients identified before August.

Collecting race/ethnicity information at time of testing is optimal, especially in surveillance of racial/ethnic health disparities. Until this becomes routine, imputation of missing race/ethnicity combined with quantitative bias-adjustment to account for misclassification of the imputed race/ethnicity can improve estimates of the COVID-19 burden among racial/ethnic groups when race/ethnicity data are missing.12 In this study, we calculate SARS-CoV-2 infection, hospitalizations, and case fatality rates by race/ethnicity group and report the absolute racial/ethnic disparities in SARS-CoV-2 infection rates in Fulton County, Georgia after accounting for missing race/ethnicity information.

Methods

Fulton County, Georgia, includes the city of Atlanta and residents identify as Black (44%), White (40%), Hispanic (7%), Asian (7%), and other races/ethnicities (2%).13 Between 29 February 2020 and 18 Aug 2020, 19,637 cases of SARS-CoV-2 infection were reported among Fulton County residents. Case reports included the patients’ residential address, full name, race/ethnicity, hospitalization (yes/no/unknown), and death (yes/no/unknown). Fulton County Board of Health staff geocoded case-patients’ address to census block groups. For this analysis, we categorized reported race/ethnicity as Black, Hispanic, Asian, White or Other.

We accounted for missing race/ethnicity information using a three-step approach: 1) imputation of race/ethnicity for all case-patients, 2) validation of the race/ethnicity imputation by calculating the accuracy of imputation among case-patients with reported race/ethnicity information, and 3) bias-adjustment of race/ethnicity estimates to account for misclassification of imputation among case-patients missing reported race/ethnicity information. Hereafter, we refer to race/ethnicity as reported when provided in case-patient records, imputed when referring to the imputed case-patient race/ethnicity, and classified when referring to the combined reported and imputed race/ethnicity after bias-adjustment.

First, for all case-patients we predicted their racial/ethnic group using the Bayesian Improved Surname Geocoding method.14 This method estimates the probability of a person being classified as Black, Hispanic, Asian, White or Other race/ethnic group based on the case-patient’s surname and residential census block group, and the population distribution of race/ethnicity for census block groups and surnames. Imputation was performed using the R package “wru,” which includes the 2010 surname census list with corresponding race/ethnicity distribution. Geographic distribution of race/ethnicity came from the 2018 5-year American Community Survey.15,16 For the 546 (2.8%) case-patients who could not be geocoded, race/ethnicity was imputed using surname only.

Second, we validated the race/ethnicity imputation among case-patients whose race/ethnicity was available in the dataset (n=12,222, 64%). Predictive values (PV) were calculated for each imputed race/ethnic group. The PV is the probability that a person’s reported race/ethnicity group classification was correctly imputed.12

Third, we used the PV values as bias parameters to quantitatively adjust for the expected misclassification of the imputed race/ethnicity groups. We assigned each race/ethnicity group PV from the validation to a Dirichlet distribution (Table 1). We then reclassified the imputed race/ethnicity probabilistically (100,000 iterations).12 The quantitative bias-adjustment mathematically accounts for inaccurate assignment of case-patients to a race/ethnicity group by the Bayesian Improved Surname Geocoding method. Sampling error was incorporated into the estimates using bootstrap approximation from a standard normal distribution.12

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 1:

Predictive values (PV) and 95% confidence intervals (CI) of the imputation by race/ethnicity based on residence and surname compared with reported race/ethnic group in the State Electronic Notifiable Disease Surveillance System

For both the complete case and bias-adjusted analyses, we calculated the SARS-CoV-2 infection rates (per 1,000 persons), hospitalization proportions (hospitalized cases/reported cases), and case fatality rates (deaths/reported cases) by race/ethnicity group. We reported 95% confidence intervals (CI) for the complete case analysis and medians with 95% simulation intervals (SI) for the bias-adjusted estimates. We evaluated how accounting for missing race/ethnicity information impacts measures of racial/ethnic disparities by calculating the differences in SARS-CoV-2 infection rates in each race/ethnicity group compared with persons of White race/ethnicity, among case-patients with reported race/ethnicity information, and among all case-patients after bias-adjustment. All analyses used R v3.6 (Vienna, Austria). The Georgia Department of Health determined this activity to be consistent with public health surveillance, so does not require informed consent or IRB approval.

Results

Among the 19,637 cases reported in Fulton County from 29 February to 19 August 2020, 7,145 (36%) were missing race/ethnicity information in the case report. Data were more complete among the 1,840 hospitalized case-patients, where only 14 (3.5%) were missing race/ethnicity information. All deceased case-patients (n=456) had complete information on race/ethnicity.

Comparison of reported versus imputed race/ethnicity group showed that the algorithm’s imputation accuracy varied by imputed race/ethnicity group (Table 1). Of the 5,535 persons who were imputed as Black race/ethnicity, 93% (95%CI: 92%, 93%) were reported as Black in case reports (n=5,118). Among persons imputed as Hispanic ethnicity, 84% (95%CI: 82%, 85%) were reported as Hispanic. The algorithm was less accurate for case-patients with race/ethnicity imputed as Asian (PV=69%, 95%CI: 61%, 74%) and as White (PV=55%, 95%CI: 53%, 56%). The PV estimates for racial/ethnic groups changed over time, likely due to changes in the prevalence of demographic groups affected by the pandemic over time (Supplemental Table 1).

In both the complete case and bias-adjusted analyses, the SARS-CoV-2 infection rates were highest among those classified as Other, followed by Hispanic, Black, White, and Asian (Table 2a and 2b). Imputation and bias-adjustment yielded higher estimates of infection rates than complete case analysis because more case-patients were included in the numerator. Estimated infection rates increased 1.8-fold for persons classified as Asian, 1.7-fold for White, 1.7-fold for Hispanic, 1.6-fold for Other, and 1.5-fold for Black. Hospitalization proportions and case fatality rates decreased across all race/ethnicity groups with imputation and bias-adjustment compared with the complete case analyses, because more cases were included in the denominator. In both the complete case and bias-adjusted analyses, case-patients who were classified as Black race/ethnicity had the highest hospitalization proportions (complete case: 17%, 95%CI: 16%, 18%; bias-adjusted: 12%, 95%SI: 11%, 12%) and case fatality rates (complete case: 4.6%, 95%CI: 4.1%, 5.1%; bias-adjusted: 3.1%, 95%SI: 2.8%, 3.4%).

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 2a:

Complete case estimates of SARS-CoV-2 infection rates, hospitalization proportions, and case fatality rates by race/ethnic group among 12,222 cases reported to Fulton County Board of Health, 29 February – 18 Aug 2020.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 2b:

Bias-adjusted estimates of SARS-COoV-2 infection rates, hospitalization proportions, and case fatality rates including 7,415 cases with imputed race/ethnicity, among 19,637 cases reported to Fulton County Board of Health before 18 Aug 2020.

The magnitude of the absolute disparity—difference in SARS-CoV-2 infection rates for case-patients classified in each race/ethnicity group compared with case-patients classified White— increased in the bias-adjusted analysis relative to the complete case analysis for nearly all race/ethnicity groups (Table 3). When comparing bias-adjusted with complete case results, the absolute disparity in infection rates increased 1.3-fold among classified Black and 1.6-fold among classified Hispanic race/ethnicity groups in reference to case-patients classified as White.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 3:

Relative difference (RD) of SARS-CoV-2 infection rates among minority groups compared with non-Hispanic White persons among cases with complete information and after accounting for missing race/ethnicity among 4004 SARS-CoV-2 infected persons reported to Fulton County before 20 May 2020.

Discussion

In this study, accounting for missing race/ethnicity information revealed greater differences in SARS-CoV-2 infection rates comparing most racial/ethnic groups with case-patients of White race. These results suggest that national estimates, which exclude case-patients with missing race/ethnicity information, may underestimate the magnitude of absolute racial/ethnic disparities in COVID-19 morbidity and mortality.6,8

Our results underscore the need for imputation combined with bias-adjustment. In our study population, the PV estimates indicated that imputation alone overestimated infections among case-patients classified as White and underestimated infections among case-patients classified as Black. Therefore, imputation alone would have been insufficient.

Both the complete case analysis and the bias-adjusted estimates demonstrate important absolute racial/ethnic disparities in the infection rates. The bias-adjusted estimates do not change our understanding of the direction of racial/ethnic disparities in the COVID-19 pandemic; however, the magnitude of racial/ethnic disparities changed meaningfully after bias-adjustment. In contrast, the hospitalization proportion and case fatality rate decreased across all classified race/ethnicity groups after accounting for missing race/ethnicity information because few hospitalized or deceased case-patients were missing race/ethnicity information. These results highlight the need for more complete reporting so that health equity and racial justice efforts aimed at addressing these disparities operate on the most accurate data possible.

The imputation of race/ethnicity has limitations. The Bayesian Improved Surname Geocoding algorithm limits the racial/ethnic groups that can be imputed to Black, Hispanic, Asian, White, or Other. The reliance on categories of ‘other’ is problematic for identifying and addressing disparities in other racial/ethnic populations (e.g. indigenous populations). Future studies should explore how accounting for missing race/ethnicity impacts other disease burden measures.

Our findings emphasize the importance of collecting complete race/ethnicity data at the time of testing, for the current pandemic and future outbreaks. When data are missing, Bayesian Improved Surname Geocoding combined with quantitative bias-adjustment provides better estimates of the racial/ethnic disparities in SARS-CoV-2 infection rates, hospitalization proportions, and case fatality rates.

Data Availability

Due to patient confidentiality, data are only available upon request from the Fulton County Board of Health and with IRB approval from the Georgia Department of Public Health. Example code used to perform the imputation and bias-adjustment is available on GitHub (https://github.com/lcolli5/Adaptive-Validation).

Appendix

View this table:
  • View inline
  • View popup
  • Download powerpoint
Supplemental Table 1:

Positive predictive value (PPV) of the imputation by race/ethnicity based on residence and surname compared with the reported race/ethnic group in COVID-19 case report stratified by months (March through May and June through August) of diagnosis

Footnotes

  • ↵* Co-First Authors

  • Conflicts of Interest: The authors have no conflicts of interest to declare.

  • Financial Support: This work was supported in part by the US National Institutes of Health F31CA239566 (PI L. J. Collin), R01LM013049 (PI T. L. Lash), and K24AI114444 (PI N. R. Gandhi). It was also supported by a grant from the Robert W. Woodruff foundation (PI A. Chamberlain). K. Labgold is supported in part by the Center for Reproductive Health Research in the Southeast (RISE) Doctoral Fellowship and an ARCS Foundation Award.

  • Data Access: Due to patient confidentiality, data are only available upon request from the Fulton County Board of Health and with IRB approval from the Georgia Department of Public Health. Example code used to perform the imputation and bias-adjustment is available on GitHub.

References

  1. 1.↵
    Stokes EK, Zambrano LD, Anderson KN, et al. Coronavirus Disease 2019 Case Surveillance - United States, January 22-May 30, 2020. MMWR Morb Mortal Wkly Rep. 2020;69(24):759–765. doi:10.15585/mmwr.mm6924e2
    OpenUrlCrossRefPubMed
  2. 2.↵
    Health Equity Considerations & Racial & Ethnic Minority Groups. National Center for Immunization and Respiratory Diseases (NCIRD), Division of Viral Diseases. https://www.cdc.gov/coronavirus/2019-ncov/need-extra-precautions/racial-ethnic-minorities.html. Published 2020. Accessed July 17, 2020.
  3. 3.↵
    McClure ES, Vasudevan P, Bailey Z, Patel S, Robinson WR. Racial Capitalism within Public Health: How Occupational Settings Drive COVID- 19 Disparities. Am J Epidemiol. 2020:113–120.
  4. 4.↵
    Egede LE, Walker RJ. Structural Racism, Social Risk Factors, and Covid-19 — A Dangerous Convergence for Black Americans. N Engl J Med. 2020;383(12):e77(1)–e77(3). doi:10.1056/NEJMp2023616
    OpenUrlCrossRef
  5. 5.↵
    Servik K. ‘Huge hole’ in COVID-19 testing data makes it harder to study racial disparities. Science (80-). July 2020. doi:0.1126/science.abd7715
  6. 6.↵
    Cowger TL, Davis BA, Etkins OS, et al. Comparison of Weighted and Unweighted Population Data to Assess Inequities in Coronavirus Disease 2019 Deaths by Race/Ethnicity Reported by the US Centers for Disease Control and Prevention. JAMA Netw open. 2020;3(7):e2016933. doi:10.1001/jamanetworkopen.2020.16933
    OpenUrlCrossRefPubMed
  7. 7.↵
    Georgia Department of Public Health COVID-19 Daily Status Report. https://dph.georgia.gov/covid-19-daily-status-report. Published 2020. Accessed July 18, 2020.
  8. 8.↵
    Oppel R, Gebelhoff R, Lai K, Wright W, Smith M. The Fullest Look Yet at the Racial Inequity of Coronavirus. New York Times. https://www.nytimes.com/interactive/2020/07/05/us/coronavirus-latinos-african-americans-cdc-data.html?campaign_id=2&emc=edit_th_20200706&instance_id=20039&nl=todaysheadlines&regi_id=71026656&segment_id=32674&user_id=c99fb3a6b3b754c. Published 2020. Accessed July 18, 2020.
  9. 9.↵
    Wu SL, Mertens AN, Crider YS, et al. Substantial underestimation of SARS-CoV-2 infection in the United States. Nat Commun. 2020. doi:10.1038/s41467-020-18272-4
    OpenUrlCrossRef
  10. 10.↵
    Perkins NJ, Cole SR, Harel O, et al. Principled Approaches to Missing Data in Epidemiologic Studies. Am J Epidemiol. 2017;187(3):568–575. doi:10.1093/aje/kwx348
    OpenUrlCrossRef
  11. 11.↵
    The Coronavirus Aid, Relief, and Economic Security (CARES) Act. United States; 2020.
  12. 12.↵
    1. Gail M,
    2. Krickeberg K,
    3. Samet J,
    4. Tsiatis A,
    5. Wong W
    Lash TL, Fox MP, Fink AK. Applying Quantitative Bias Analysis to Epidemiologic Data. (Gail M, Krickeberg K, Samet J, Tsiatis A, Wong W, eds.). New York: Springer; 2009. doi:10.1007/978-0-387-87959-8
    OpenUrlCrossRef
  13. 13.↵
    American Community Survey: Hispanic or Latino Origin by Race. The United States Census Bureau. https://data.census.gov/cedsci/table?t=RaceandEthnicity&g=0500000US13121&tid=ACSDT5Y2018.B03002&moe=false&hidePreview=true. Published 2020. Accessed August 19, 2020.
  14. 14.↵
    Elliott MN, Fremont A, Morrison PA, Pantoja P, Lurie N. A new method for estimating race/ethnicity and associated disparities where administrative records lack self-reported race/ethnicity. Health Serv Res. 2008;43(5 P1):1722–1736. doi:10.1111/j.1475-6773.2008.00854.x
    OpenUrlCrossRefPubMedWeb of Science
  15. 15.↵
    About the American Community Survey. US Census Bureau. https://www.census.gov/programs-surveys/acs/about.html. Published 2020. Accessed July 8, 2020.
  16. 16.↵
    Khanna K, Imai K. Package ‘wru’: Who are You? Bayesian Prediction of Racial Category Using Surname and Geolocation. 2019. https://cran.r-project.org/web/packages/wru/wru.pdf.
Back to top
PreviousNext
Posted October 02, 2020.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Measuring the missing: greater racial and ethnic disparities in COVID-19 burden after accounting for missing race/ethnicity data
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Measuring the missing: greater racial and ethnic disparities in COVID-19 burden after accounting for missing race/ethnicity data
Katie Labgold, Sarah Hamid, Sarita Shah, Neel R. Gandhi, Allison Chamberlain, Fazle Khan, Shamimul Khan, Sasha Smith, Steve Williams, Timothy L. Lash, Lindsay J. Collin
medRxiv 2020.09.30.20203315; doi: https://doi.org/10.1101/2020.09.30.20203315
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Measuring the missing: greater racial and ethnic disparities in COVID-19 burden after accounting for missing race/ethnicity data
Katie Labgold, Sarah Hamid, Sarita Shah, Neel R. Gandhi, Allison Chamberlain, Fazle Khan, Shamimul Khan, Sasha Smith, Steve Williams, Timothy L. Lash, Lindsay J. Collin
medRxiv 2020.09.30.20203315; doi: https://doi.org/10.1101/2020.09.30.20203315

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Epidemiology
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)