COVID-19 outcomes, risk factors and associations by race: a comprehensive analysis using electronic health records data in Michigan Medicine ============================================================================================================================================ * Tian Gu * Jasmine A. Mack * Maxwell Salvatore * Swaraaj Prabhu Sankar * Thomas S. Valley * Karandeep Singh * Brahmajee K. Nallamothu * Sachin Kheterpal * Lynda Lisabeth * Lars G. Fritsche * Bhramar Mukherjee ## Structured Abstract **Importance** Blacks/African-Americans are overrepresented in the number of COVID-19 infections, hospitalizations and deaths. Reasons for this disparity have not been well-characterized but may be due to underlying comorbidities or sociodemographic factors. **Objective** To systematically determine patient characteristics associated with racial/ethnic disparities in COVID-19 outcomes. **Design** A retrospective cohort study with comparative control groups. **Setting** Patients tested for COVID-19 at University of Michigan Medicine from March 10, 2020 to April 22, 2020. **Participants** 5,698 tested patients and two sets of comparison groups who were not tested for COVID-19: randomly selected unmatched controls (n = 7,211) and frequency-matched controls by race, age, and sex (n = 13,351). **Main Outcomes and Measures** We identified factors associated with testing and testing positive for COVID-19, being hospitalized, requiring intensive care unit (ICU) admission, and mortality (in/out-patient during the time frame). Factors included race/ethnicity, age, smoking, alcohol consumption, healthcare utilization, and residential-level socioeconomic characteristics (SES; i.e., education, unemployment, population density, and poverty rate). Medical comorbidities were defined from the International Classification of Diseases (ICD) codes, and were aggregated into a comorbidity score. **Results** Of 5,698 patients, (median age, 47 years; 38% male; mean BMI, 30.1), the majority were non-Hispanic Whites (NHW, 59.2%) and non-Hispanic Black/African-Americans (NHAA, 17.2%). Among 1,119 diagnosed, there were 41.2% NHW and 37.4% NHAA; 44.8% hospitalized, 20.6% admitted to ICU, and 3.8% died. Adjusting for age, sex, and SES, NHAA were 1.66 times more likely to be hospitalized (95% CI, 1.09-2.52; *P=*.02), 1.52 times more likely to enter ICU (95% CI, 0.92-2.52; *P*=.10). In addition to older age, male sex and obesity, high population density neighborhood (OR, 1.27 associated with one SD change [95% CI, 1.20-1.76]; *P*=.02) was associated with hospitalization. Pre-existing kidney disease led to 2.55 times higher risk of hospitalization (95% CI, 1.62-4.02; *P*<.001) in the overall population and 11.9 times higher mortality risk in NHAA (95% CI, 2.2-64.7, *P*=.004). **Conclusions and Relevance** Pre-existing type II diabetes/kidney diseases and living in high population density areas were associated with high risk for COVID-19 susceptibility and poor prognosis. Association of risk factors with COVID-19 outcomes differed by race. NHAA patients were disproportionately affected by obesity and kidney disease. **Question** What are the sociodemographic and pre-existing health conditions associated with COVID-19 outcomes and how do they differ by race/ethnicity? **Findings** In this retrospective cohort of 5,698 patients tested for COVID-19, high population density and comorbidities such as type II diabetes/kidney disease were associated with hospitalization, in addition to older age, male sex and obesity. Adjusting for covariates, non-Hispanic Blacks were 1.66 times more likely to be hospitalized and 1.52 times more likely to be admitted to ICUs than non-Hispanic Whites. **Meaning** Targeted interventions to support vulnerable populations are needed. Racial disparities existed in COVID-19 outcomes that cannot be explained after controlling for age, sex, and socioeconomic status. ## Introduction The COVID-19 pandemic, caused by severe acute respiratory syndrome coronavirus SARS-CoV-2, has demonstrated racial disparities in those affected in the United States (US)1–13. In the state of Michigan in particular, there have been 64,998 confirmed COVID-19 cases and 5,943 deaths as of June 11, 2020, which makes Michigan one of the most affected states in the US14. While Blacks/African-Americans represent 14% of the Michigan population15, they account for 31% of COVID-19 cases and 40% of deaths attributed to COVID-1914. Similar trends are observed in New York9 and Illinois, where there is an overrepresentation of African-Americans and Latinos in COVID-19 cases and deaths16. Overrepresentation of minority populations in poorer COVID-19 outcomes may be explained by a myriad of factors, such as by weathering, or early health deterioration due to cumulative impact of socioeconomic disparity 17,18, higher comorbidity burden19, inadequate healthcare19, and socioeconomic differences related to unemployment, food insecurity, and housing instability17. Several studies have reported non-White, male, older age, current smoking, and comorbid conditions as high risk factors of COVID-19 susceptibility and hospitalization2,13,20–24. Racial/ethnic minorities who maintain livelihood as essential workers are more likely to be exposed to the virus16, whereas living in high density areas1, high proportion of homelessness25 and incarceration26 adds to the barriers to social distancing16. Although studies had reported many possible reasons for the overrepresentation of minority populations in poorer health outcomes, the evidence supporting the observed disparity in COVID-19 outcomes remains limited, and more data from diverse communities need to be analyzed. In addition, experiences from COVID-19 highlight the need to not only identify risk factors but also to avoid spurious conclusions of racial/ethnic differences being explained by biology, which could further perpetuate racial/ethnic stereotypes17. Data on holistic clinical and sociodemographic factors contributing to racial/ethnic differences in COVID-19 outcomes is limited. Some previous studies have also compared those who tested positive for COVID-19 to those who are negative, instead of population-based controls where selection bias is potentially observed27,28. The objective of this study is to determine sociodemographic and comorbid conditions that are associated with COVID-19 outcomes (e.g., testing positive, hospitalization, admission to ICU, and mortality), utilizing electronic health records (EHR) from the University of Michigan, which serves a large patient population in the US Midwest. ## Subjects and Methods ### Evaluation cohorts #### COVID-19 cohort We extracted the EHR data for patients tested for COVID-19 at the University of Michigan Medicine Health System, also known as Michigan Medicine (MM), from March 10, 2020 to April 22, 2020. Our study cohort of 5,698 patients comprises 5,500 patients (96.5%) who were tested at MM and 198 patients (3.5%) who were treated for COVID-19 in MM but tested elsewhere, of which 1,119 were COVID-19 positive. For ease of notation, we refer to them as **the tested cohort (n=5**,**698)** and **the positive cohort (n=1**,**119)**. The tested cohort is a non-random sample of the population, since the testing protocol at MM focused on prioritized testing29 (e.g., testing symptomatic patients and those at the highest risk of exposure). This cohort also contained transfer patients from other hospitals. #### Control selection To understand how selection bias factored into our sample, in addition to comparing COVID-19 positive patients with those testing negative, we created two sets of controls from the MM database. The first **unmatched control group (n=7**,**211)** is a similar-sized random sample of contemporaneous patients. The second 1:3 **frequency-matched control group (n=13**,**351)** is matched by race, sex and age (above or below 50). All controls were alive at the time of data extraction. Study protocols were reviewed and approved by the University of Michigan Medical School Institutional Review Board (IRB ID HUM00180294 and HUM00155849). #### Description of variables A summary data dictionary, eTable S2A, is available in Supplement with source and definition of each variable used in our analysis. #### COVID-19 prognosis outcomes Among the patients diagnosed with COVID-19, we considered various stages of progression of the disease that included hospitalization, admission to the ICU and death. Hospitalizations were defined by inpatients with a COVID-19 diagnosis where the admission date was within the time frame of the data extraction. ICU patients were defined as patients who were admitted to ICU units any time during their COVID-19 related hospitalization. Mortality data including inpatient and non-hospitalized deaths was extracted from EHR. #### Classifying patients who were still in hospital and ICU We categorized patients into non-hospitalized, hospitalized (includes ICU stays), and hospitalized with ICU stay based on the admission and discharge data. Several patients were still admitted in the hospital (non-ICU, n=53) or were still in an ICU (n=113) at the time of the data extraction. We performed a sensitivity analysis by excluding these patients whose final prognostic outcome is unclear from the analysis (eTable S4 in Supplement). #### Generation of comorbidities from electronic health records We constructed the comorbid conditions using available International Classification of Diseases (ICD; ninth and tenth editions) code for 23,769 individuals (ntested: 5,225, nunmatched: 6,811, nmatched: 11,733) from EHR. Longitudinal time-stamped diagnoses were recoded to indicator variables for whether a patient ever had a given diagnosis code recorded by MM. To differentiate *pre-existing* conditions from diagnoses related to COVID-19 testing/treatment, we applied a 14-day-prior restriction on the tested cohort by removing diagnoses that first appeared within the 14 days before the first test or diagnosis date, whichever was earlier (4,622 of the 5,225 tested individuals had diagnoses data after the 14-day-prior restriction). We focused on seven binary disease indicators that have been specifically mentioned in relation to COVID-19 outcomes: respiratory, circulatory, any cancer, type II diabetes, kidney, liver, and autoimmune diseases (ICD codes in eTable S2A in Supplement). We calculated a comorbidity score as the sum of these seven that ranges from 0-7. For exploratory analysis, we defined prior medication use as at least one appearance of a given class of medication in the patient’s EHR. #### Defining race/ethnicity groups, SES and other adjustment covariates Variables such as self-reported sex, race/ethnicity, smoking status, alcohol consumption, body mass index (BMI), and age were extracted from the EHR. We classified a patient to be seeking primary care in MM if they have had an encounter in any of the primary care locations in MM since January 1, 2018. Measures of socioeconomic characteristics are defined by US census tract (based on residential address available in each patient’s EHR) for the year 2010. The boundaries for the census tracts were normalized to 2010 tract boundaries using the Longitudinal Tract Data Base 30. We chose three SES indicators included in the National Neighborhood Data Archive (NaNDA)31: percentage of population with below high school (