Unsupervised Machine Learning Unveil Easily Identifiable Subphenotypes of COVID-19 With Differing Disease Trajectories ====================================================================================================================== * Jacky Chen * Jocelyn Hsu * Alexandra Szewc * Clotilde Balucini * Tej D. Azad * Kirby Gong * Han Kim * Robert D Stevens ## Abstract **Background** Given the clinical heterogeneity of COVID-19 infection, we hypothesize the existence of subphenotypes based on early inflammatory responses that are associated with mortality and additional complications. **Methods** For this cross-sectional study, we extracted electronic health data from adults hospitalized patients between March 1, 2020 and May 5, 2021, with confirmed primary diagnosis of COVID-19 across five Johns Hopkins Hospitals. We obtained all electronic health records from the first 24h of the patient’s hospitalization. Mortality was the primary endpoint explored while myocardial infarction (MI), pulmonary embolism (PE), deep vein thrombosis (DVT), stroke, delirium, length of stay (LOS), ICU admission and intubation status were secondary outcomes of interest. First, we employed clustering analysis to identify COVID-19 subphenotypes on admission with only biomarker data and assigned each patient to a subphenotype. We then performed Chi-Squared and Mann-Whitney-U tests to examine associations between COVID-19 subphenotype assignment and outcomes. In addition, correlations between subphenotype and pre-existing comorbidities were measured using Chi-Squared analysis. **Results** A total of 7076 patients were included. Analysis revealed three distinct subgroups by level of inflammation: hypoinflammatory, intermediate, and hyperinflammatory subphenotypes. More than 25% of patients in the hyperinflammatory subphenotype died compared to less than 3% hypoinflammatory subphenotype (p<0.05). Additional analysis found statistically significant increases in the rate of MI, DVT, PE, stroke, delirium and ICU admission as well as LOS in the hyperinflammatory subphenotype. **Conclusion** We identify three distinct inflammatory subphenotypes that predict a range of outcomes, including mortality, MI, DVT, PE, stroke, delirium, ICU admission and LOS. The three subphenotypes are easily identifiable and may aid in clinical decision making. ## Introduction Since the first coronavirus disease (COVID-19) case reported in December 2019 (1), it was clear that this disease has extraordinarily heterogenous patient presentations that lead to differing outcomes. The heterogeneity in the patient presentation makes it difficult for clinicians to predict which patients are likely to deteriorate, resulting in delayed treatment. A clearer understanding of patient heterogeneity and disease trajectory might help clinicians anticipate which patients are likely to require more aggressive and earlier treatments and which therapies will be most beneficial to a patient. One of the factors associated with worse outcomes is the presence of an overactive immune system such as a cytokine storm, a hyperinflammatory state secondary to excessive cytokine production (2,3). Additionally, research has shown that different types of immune-responses are associated with differing levels of disease severity (4). Although resolving this heterogeneity would allow for a more individualized treatment approach, it remains difficult to do so only using standard clinical features early on in the disease trajectory. Our primary objective was to discover distinct subphenotypes of COVID-19 that would be easily identifiable by a human practitioner early in the disease trajectory, providing order and structure to the heterogeneity mentioned above. Although there already exists limited literature on developing subphenotypes of COVID-19, previous studies either have low sample sizes, generate subphenotypes that are difficult to distinguish by clinicians, or are hard to understand in the context of immune responses (5–8). Our study aims to overcome these limitations and produce subphenotypes that can aid timely clinical decision-making and lead to better patient outcomes. ## Methods ### Data The data utilized for this study was taken from the JH-CROWN: The COVID PMAP Registry, which contains data from five Johns Hopkins Hospitals serving approximately 7 million people in the Maryland/Washington D.C. area and containing over 2,000 beds (9–12). The database was constructed directly from the clinical electronic health records and contains hospitalized adult patients (>18y) who were diagnosed with COVID-19 between March 5, 2020 and May 5, 2021. Diagnosis of COVID-19 was defined as the following: the detection of SARS-CoV-2 from any nucleic acid test of any specimen type with an Emergency Use Authorization from the U.S., patients who had been flagged as having COVID-19, or who had been diagnosed with suspected or confirmed COVID-19 at discharge (13). ### Patient Selection Within our study’s timeframe, there were 15,470 COVID-19 hospital stays from the JH-CROWN database that includes hospitalized patients with any COVID-19 diagnosis. We excluded patients who were hospitalized for other ailments but had COVID-19 as a secondary diagnosis (n = 8378). For example, a patient who was hospitalized for trauma following a car accident and contracted COVID-19 while hospitalized would be excluded. Then, we excluded patients who have no recorded laboratory tests within the first 24 hours of admission (n = 16), leaving 7076 patients to be included in our study. To summarize, the 7076 patients examined in this study have primary diagnosis of COVID 19 and have recorded laboratory tests within the first 24 hours of admission. Figure 1 contains a graphical representation of the patient selection process. ![Figure 1:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2023/04/15/2023.04.07.23288152/F1.medium.gif) [Figure 1:](http://medrxiv.org/content/early/2023/04/15/2023.04.07.23288152/F1) Figure 1: Patient selection flowchart In the CROWN dataset database, data were labeled with an admission identification, which is specific to each patient’s hospitalization. If the patient were to be re-admitted, then they would receive a new admission identification. In this study, we treated each hospital admission as an independent sample. For patients with more than one hospitalization recorded, the mean time lapse between the discharge of a previous hospitalization and the start of the next hospitalization was 42.35 days, supporting our assumption to treat each hospitalization stay as an independent sample. ### Patient’s Data Data for each patient contained high dimensional and longitudinal data in three main categories of interest: demographic, comorbidities, and laboratory results. Demographic data included patient age, sex, and race. Laboratory data included a periodically recorded blood panel including specific biomarkers such as IL-6, prolactin, eosinophils, and others as listed in Supplemental Table 1. Only laboratory data gathered in the first 24 hours of admission were used in the clustering analysis. For laboratory data, we first omitted any data points that were marked as erroneous by the database. Then, we aggregated all the laboratory tests into summary statistics. We engineered features to include the minimum, maximum, mean, standard deviation, first, last, and total change of each of the laboratory tests over the first 24 hours of admission. We excluded any tests where more than 50% of the patient’s first 24h hospitalization did not have at least one value. Furthermore, for any two features with a Pearson’s correlation coefficient of greater than 90%, one feature was removed at random (Supplemental Table 1). Finally, we imputed any missing values for the remaining features using MissRanger (14), a random forest imputation package. Primary outcomes of interest measured was mortality. Secondary outcomes of interest include myocardial infarction (MI), pulmonary embolism (PE), deep vein thrombosis (DVT), delirium, stroke, length of stay (LOS), ICU admission, and respiratory features, such as Oxygen, High-flow Oxygen, ECMO and pressor use (Applicable ICD-10 codes in Supplemental Table 2). Comorbidities to examine were chosen based on the Elixhauser Comorbidity index (15). ### Consensus Clustering Clustering is an unsupervised machine learning tool aimed at separating heterogenous data into more homogenous subsets by grouping data according to their similarity. Simple clustering methods often assign data into groups randomly, compute a similarity index, and then aim to maximize this similarity index by rearranging patients into different groups. Then, the similarity index is recomputed with the new groups and the process is repeated. However, due to the random nature of the initial assignment, clustering algorithms are not guaranteed to reach a global optimum. To address this issue, we used ConsensusClusterPlus (16), a consensus clustering algorithm, to group our patients into distinct subphenotypes using only the aggregated laboratory results based on the first 24 hours of their admission. Consensus clustering is an algorithm run many times, and the outcome is decided by having each run “vote” on how the clusters should be assigned. By repeatedly clustering, we significantly reduced the effect of random chance on our results. We use a partition around medoids (PAM) clustering with a Gower distance due to both simplicity and ease of interpretability. Finally, to gain further insight into the clustering, we plotted the standardized values of each feature in each cluster compared to other clusters. ### Choosing the Optimal Number of Clusters We considered a variety of methods to determine the optimal number of clusters. Traditionally, methods such as the “elbow method” have been used to assess the optimal number of clusters. However, it has been shown that the elbow plot as well as many other traditional cluster evaluation strategies can be unreliable when used in consensus clustering algorithms (17). Instead, Șenbabaoğlu et al. propose a metric called the proportion of ambiguous clustering (PAC) to determine the optimal number of clusters (17). This method involves plotting the cumulative distribution function (CDF) curve of a consensus matrix and finding the curve that has the lowest number of ambiguous clusters. However, their study was done on low dimensional artificial data that had easy to observe clusters. When we applied the PAC method to our high dimensional data, the metric suggested that the optimal number of clusters should be the same as the number of patients (Supplemental Figure 1); that is, each patient should be its own cluster. Obviously, this would account for all heterogeneity but would not be clinically tractable. Because of this, we took this metric into account but ultimately also used clinical significance to determine the optimal number of clusters. ### Cluster Nomenclature We used C-Reactive Protein values, which are a general indicator of the systemic inflammation level (18), to classify the clusters generated into different subphenotypes. We designated the cluster with the lowest, middle, and highest C Reactive Protein values as the hypoinflammatory cluster, intermediate cluster, and the hyperinflammatory cluster respectively. ### Standardized Mean Variable Analysis To understand the differences in feature values between clusters identified by an unsupervised machine learning model, we computed the standardized mean differences (SMD) of each feature. The SMD is a measure of the difference in mean values between two groups, standardized by the pooled standard deviation of the groups (Supplemental Equation 1). It is useful for comparing the means of different features, as it is unitless and can be interpreted as a relative difference between the means. To compute the SMD, we first calculated the mean and standard deviation of each feature for each cluster and then used these values to standardize the feature values within each cluster. Next, we calculated the difference in standardized means between each pair of clusters and plotted these differences as SMD plots (Figure 2, Supplemental Figures 2-3). These plots provide a clear visual representation of the relative differences in feature values between the clusters and can help to identify which features are most important for distinguishing between the clusters. These results can also be used to interpret the clusters and gain insights on the underlying structure of data. ![Figure 2:](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2023/04/15/2023.04.07.23288152/F2.medium.gif) [Figure 2:](http://medrxiv.org/content/early/2023/04/15/2023.04.07.23288152/F2) Figure 2: Standardized Variable Plots of Hyper vs Hypoinflammatory Clusters. Standardized mean values are computed by taking the mean of a specific feature in a group and comparing to the mean of another group (solid lines) Faded lines represent comparisons of a cluster to all patients. ### Comorbidity and Outcome Exploration We explored how the prevalence of various outcomes differ among the clusters. Patient stay identifiers were matched with the patients from the clustering experiment to various other tables containing patient outcomes. We then counted the number of patients with the specific outcome in various cluster and conducted statistical tests to determine significance of clinical subphenotype against the outcomes of interest. We conducted Chi Squared tests for categorial variables and Mann-Whitney U tests for continuous variables. Outcomes examined are mortality, pulmonary embolism, deep vein thrombosis, myocardial infarction, and stroke. The ICD-10 codes used to identify each outcome is specified in the supplemental material (Supplemental Table 2). Using the same method as above, we explored the association of phenotype assignment with preexisting comorbidities. We chose to examine comorbidities that are listed in the Elixhauser Comorbidity Index. ## Results Starting with 15,470 adult COVID-19 positive patients, we followed the patient selection criteria as described above, which resulted in 7076 patients being included in the study (Figure 1). Following the cluster procedure above, we generated hypoinflammatory, intermediate, and hyperinflammatory subphenotypes with 2090, 3076, and 1910 patients in each subphenotype respectively. The hypoinflammatory patients had, on average, a younger mean age, a lower number of total comorbidities, lower rates of ICU admission, and a lower average hospitalization stay (Table 1). View this table: [Table 1:](http://medrxiv.org/content/early/2023/04/15/2023.04.07.23288152/T1) Table 1: Cluster descriptions To examine the features that drove the cluster decision algorithm, we plotted standardized variable values comparing the three clusters (Figure 2, Supplemental Figure 2 and Supplemental Figure 3). The hypoinflammatory clusters are characterized by high lymphocyte values but low neutrophil values, whereas the hyperinflammatory values are characterized by low lymphocyte values but high neutrophil values. Upon examining the levels of biomarkers, we discovered that many biomarkers have significant differences across the clusters (Table 2). View this table: [Table 2:](http://medrxiv.org/content/early/2023/04/15/2023.04.07.23288152/T2) Table 2: Results and Outcomes by cluster To further define the clusters, we examined a detailed breakdown of the clinical outcomes during hospitalization in each cluster (Table 3). We found that the hyperinflammatory class had worse outcomes than the intermediate cluster, which, in turn, had worse outcomes than the hypoinflammatory subphenotypes. Of note, this relationship was not noted for pulmonary embolism. View this table: [Table 3:](http://medrxiv.org/content/early/2023/04/15/2023.04.07.23288152/T3) Table 3: Mean of biomarker features used in clustering ## Discussion In this study, we clustered COVID-positive patients into three distinct subphenotypes based on inflammatory biomarkers collected within the first 24 hours of admittance to hospital. We found significant correlations between inflammatory subphenoypes and the occurrence of key complications during hospitalization (stroke, myocardial infarctions (MI), deep-vein thrombosis (DVT), delirium, and mortality). The hypoinflammatory subphenotypes are characterized by higher levels of lymphocytes and eosinophils despite having a lower C-reactive protein levels, which characterize overall inflammation. The hyperinflammatory clusters have the highest BUN (creatinine) ration as well as the highest neutrophil counts (Figure 2). ### Physiology of disease In characterizing the features associated with the discrete clusters, we identified features that appear consistent with current understandings of COVID-19 pathophysiology. We noted relative eosinophilia associated with the hypoinflammatory subphenotype. Notably, eosinopenia has been associated with poor outcomes in COVID-19 and recovery of eosinophil counts has been putatively associated with a favorable recovery trajectory (19). Moreover, we observed the hyperinflammatory group had higher BUN-Creatine ratios. A common clinical measure of renal function, increases in BUN and serum creatinine in patients with COVID-19 have been strongly associated with increased rates of adverse outcomes (20). The causal relationship between renal function, volume status, and COVID-19 remains to be elucidated, but it does appear this may be a hallmark of worse relative outcome. ### Clinical Utility Early recognition of COVID-19 inflammatory subphenotypes may facilitate clinical prognostication, enable better clinical decision-making, and manifest in better patient outcomes. By having only three subphenotypes that can be easily distinguished by biomarkers, the subphenotypes identified by our study may be easier to recognize than existing subphenotypes. For instance, Dubowski et al. produced 6 subphenotypes based on over 30 biomarkers as well as vital signs features (7). This is a significantly more complicated scheme than our subphenotypes, making it harder for providers to determine which subphenotype a given patient would belong in. On the contrary, a provider can easily determine whether a given patient would belong in our hyperinflammatory subphenotype by obtaining a blood panel and examining a few key features. In addition, our results suggest the possibility of a more personalized care routine for dexamethasone use. Severe COVID-19 patients may experience inflammatory organ injury due to strong host inflammatory responses, and dexamethasone is believed to mitigate this damage by attenuating the host inflammatory response (2,21,22). At the time of writing, current guidelines are based on the RECOVERY trial, which showed dexamethasone improved outcomes for hospitalized adult patients if they require oxygen therapy (23,24). However, literature has shown that dexamethasone can lead to a variety of potential side effects, such as dexamethasone-induced hypertension (25,26), osteonecrosis of the femoral head (27,28), ventricular hypertrophy (29), diabetes (30), and other complications. Our results show that although the 38.9% of hypoinflammatory patients receive oxygen therapy, these patients experience a different type of inflammatory response than the hyperinflammatory patients. Taken in conjunction with their better outcomes, it questions the current one-size-fits-all approach to glucocorticoids. More importantly, these results also suggest that any patient in the hyperinflammatory subphenotype may benefit from glucocorticoid treatments, even if they are not currently on oxygen. ### Clinical Trial Implications Our results carry significant impact for future design and implementation of anti-inflammatory and immune modulating clinical trials for COVID -19 patients. Even though these medications carry significant adverse effects (31), they continue to be prescribed for many hospitalized COVID-19 patients (32). Our results point to a personalized approach that could be used to select patients in clinical trials of COVID-19 therapies. The inflammatory subphenotypes suggest that anti-inflammatory and immune modulating therapies might be highly beneficial when treating patient in the hyperinflammatory subphenotype but ineffective or detrimental if used in patients in the hypoinflammatory subphenotype. Future studies are needed to verify this hypothesis. ### Strengths The COVID-CROWN database by the Precision Medicine Analytics Platform offers highly detailed information regarding the patients all throughout their hospital stay. The dataset contained data from 7076 study eligible patients who were treated in 5 different hospitals, which increases the generalizability of the results compared to single hospital studies. This highly detailed data allowed us to conduct an in-depth exploration of how the patient’s inflammatory status was related to their disease trajectory throughout their inpatient stay. We demonstrated that a patient’s outcome was tangibly related to their inflammation status and that the hyperinflammatory subgroups had significantly worse disease trajectories. In addition, our subphenotypes are easily identifiable using a few key biomarker characteristics, making them potentially suitable to guide clinical decision making. Unlike other subphenotyping exercises, our study examined patients spanning over a year, which contained several waves of COVID involving different strains of SARS-CoV-2. This suggests that our study can be more resistant to bias for one strain. However, additional work should be done to update our model in light of more recent strains. ### Limitations We acknowledge that our study has limitations. One potential limitation is the lack of external validation. We were unable to overcome this due to challenges in identifying large-scale open access databases that contain similar detailed information as in the COVID-CROWN database. We also acknowledge that the COVID-19 pandemic response moved at a very fast pace. Certain treatments that may not have been available at the beginning of the pandemic became widely used towards the later stages. These treatments were not considered in our modeling approach. Additionally, there are limitations towards picking the number of optimal clusters. In our methodology section, we present evidence as to why traditional methods such as elbow plots or silhouette scores are inappropriate for consensus clustering. Furthermore, scores based on sampling or bootstrapping the data (i.e. Jaccard Coefficient) are also inappropriate due to the repeated sampling procedure already in consensus clustering. ## Conclusion In conclusion, we find three subphenotypes of COVID-19 that are easily distinguishable within the first 24 hours of hospital admission. The hyperinflammatory subphenotypes have significant increases when adverse outcomes occurred, including mortality, pulmonary embolism, stroke, deep vein thrombosis, delirium and myocardial infarction. Additional studies are needed to confirm the external validity of these results. ## Supporting information Supplemental Information [[supplements/288152_file02.pdf]](pending:yes) ## Data Availability The data used were part of JH-CROWN: The COVID Precision Medicine Analytics Platform Registry.Data and analysis code will be made available upon researcher request as allowable by the terms of the registry and the Johns Hopkins Medicine Institutional Review Board (IRB) * Received April 7, 2023. * Revision received April 7, 2023. * Accepted April 15, 2023. * © 2023, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## References 1. 1.Worobey M. Dissecting the early COVID-19 cases in Wuhan. Science. 2021 Dec 3;374(6572):1202–4. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1126/science.abm4454&link_type=DOI) 2. 2.Moore JB, June CH. Cytokine release syndrome in severe COVID-19. Science. 2020 May;368(6490):473–4. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNjgvNjQ5MC80NzMiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMy8wNC8xNS8yMDIzLjA0LjA3LjIzMjg4MTUyLmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 3. 3.Zanza C, Romenskaya T, Manetti AC, Franceschi F, La Russa R, Bertozzi G, et al. Cytokine Storm in COVID-19: Immunopathogenesis and Therapy. Medicina (Mex). 2022 Jan 18;58(2):144. 4. 4.Mathew D, Giles JR, Baxter AE, Oldridge DA, Greenplate AR, Wu JE, et al. Deep immune profiling of COVID-19 patients reveals distinct immunotypes with therapeutic implications. Science. 2020 Sep 4;369(6508):eabc8511. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjE3OiIzNjkvNjUwOC9lYWJjODUxMSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIzLzA0LzE1LzIwMjMuMDQuMDcuMjMyODgxNTIuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 5. 5.Su C, Zhang Y, Flory JH, Weiner MG, Kaushal R, Schenck EJ, et al. Clinical subphenotypes in COVID-19: derivation, validation, prediction, temporal patterns, and interaction with social determinants of health. Npj Digit Med. 2021 Jul 14;4(1):1–13. 6. 6.Lu M, Drohan C, Bain W, Shah FA, Bittner M, Evankovich J, et al. Trajectories of hostresponse biomarkers and inflammatory subphenotypes in COVID-19 patients across the spectrum of respiratory support. medRxiv. 2022 Nov 29;2022.11.28.22282858. 7. 7.Dubowski K, Braganza GT, Bozack A, Colicino E, DeFelice N, McGuinn L, et al. COVID-19 subphenotypes at hospital admission are associated with mortality: a cross-sectional study. Ann Med. 2023 Dec;55(1):12–23. 8. 8.Lee EE, Song KH, Hwang W, Ham SY, Jeong H, Kim JH, et al. Pattern of inflammatory immune response determines the clinical course and outcome of COVID-19: unbiased clustering analysis. Sci Rep. 2021 Apr 13;11(1):8080. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41598-021-87668-z&link_type=DOI) 9. 9.Wongvibulsin S, Garibaldi BT, Antar AAR, Wen J, Wang MC, Gupta A, et al. Development of Severe COVID-19 Adaptive Risk Predictor (SCARP), a Calculator to Predict Severe Disease or Death in Hospitalized Patients With COVID-19. Ann Intern Med. 2021 Jun 15;174(6):777–85. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F04%2F15%2F2023.04.07.23288152.atom) 10. 10.Annapragada AV, Greenstein JL, Bose SN, Winters BD, Sarma SV, Winslow RL. SWIFT: A deep learning approach to prediction of hypoxemic events in critically-Ill patients using SpO2 waveform prediction. PLOS Comput Biol. 2021 Dec 21;17(12):e1009712. 11. 11.Deal JA, Jiang K, Betz J, Clemens GD, Zhu J, Reed NS, et al. COVID-19 Clinical Outcomes by Patient Disability Status: A Retrospective Cohort Study. Disabil Health J. 2023 Jan 12;101441. 12. 12.Metkus TS, Sokoll LJ, Barth AS, Czarny MJ, Hays AG, Lowenstein CJ, et al. Myocardial Injury in Severe COVID-19 Compared With Non–COVID-19 Acute Respiratory Distress Syndrome. Circulation. 2021 Feb 9;143(6):553–65. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1161/CIRCULATIONAHA.120.050543&link_type=DOI) 13. 13.Avery RK, Chiang TPY, Marr KA, Brennan DC, Sait AS, Garibaldi BT, et al. Inpatient COVID-19 outcomes in solid organ transplant recipients compared to non-solid organ transplant patients: A retrospective cohort. Am J Transplant. 2021 Jul 1;21(7):2498–508. 14. 14.Mayer M. missRanger: Fast Imputation of Missing Values [Internet]. 2021 [cited 2022 Dec 16]. Available from: [https://CRAN.R-project.org/package=missRanger](https://CRAN.R-project.org/package=missRanger) 15. 15.Elixhauser A, Steiner C, Harris DR, Coffey RM. Comorbidity Measures for Use with Administrative Data. Med Care. 1998 Jan;36(1):8. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1097/00005650-199801000-00004&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=9431328&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F04%2F15%2F2023.04.07.23288152.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000071181500004&link_type=ISI) 16. 16.Wilkerson MD, Hayes DN. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics. 2010 Jun 15;26(12):1572–3. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btq170&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20427518&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F04%2F15%2F2023.04.07.23288152.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000278689000067&link_type=ISI) 17. 17.Șenbabaoğlu Y, Michailidis G, Li JZ. Critical limitations of consensus clustering in class discovery. Sci Rep. 2014 Aug 27;4(1):6207. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/srep06207&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25158761&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F04%2F15%2F2023.04.07.23288152.atom) 18. 18.Sproston NR, Ashworth JJ. Role of C-Reactive Protein at Sites of Inflammation and Infection. Front Immunol. 2018 Apr 13;9:754. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3389/fimmu.2018.00754&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=29706967&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F04%2F15%2F2023.04.07.23288152.atom) 19. 19.Cortés-Vieyra R, Gutiérrez-Castellanos S, Álvarez-Aguilar C, Baizabal-Aguirre VM, Nuñez-Anita RE, Rocha-López AG, et al. Behavior of Eosinophil Counts in Recovered and Deceased COVID-19 Patients over the Course of the Disease. Viruses. 2021 Sep;13(9):1675. 20. 20.Liu YM, Xie J, Chen MM, Zhang X, Cheng X, Li H, et al. Kidney Function Indicators Predict Adverse Outcomes of COVID-19. Med. 2021 Jan 15;2(1):38–48.e2. 21. 21.Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The Lancet. 2020 Feb 15;395(10223):497–506. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0146736(20)30183-5&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F04%2F15%2F2023.04.07.23288152.atom) 22. 22.Ruan Q, Yang K, Wang W, Jiang L, Song J. Clinical predictors of mortality due to COVID-19 based on an analysis of data of 150 patients from Wuhan, China. Intensive Care Med. 2020 May 1;46(5):846–8. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00134-020-05991-x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F04%2F15%2F2023.04.07.23288152.atom) 23. 23. National Institutes of Health. COVID-19 Treatment Guidelines Panel. Coronavirus Disease 2019 (COVID-19) Treatment Guidelines [Internet]. [cited 2022 Jan 22]. Available from: Available at [https://www.covid19treatmentguidelines.nih.gov/](https://www.covid19treatmentguidelines.nih.gov/). 24. 24.Dexamethasone in Hospitalized Patients with Covid-19. N Engl J Med. 2021 Feb 25;384(8):693–704. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1056/NEJMoa2021436&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F04%2F15%2F2023.04.07.23288152.atom) 25. 25.Herrera NA, Jesus I, Shinohara AL, Dionísio TJ, Santos CF, Amaral SL. Exercise training attenuates dexamethasone-induced hypertension by improving autonomic balance to the heart, sympathetic vascular modulation and skeletal muscle microcirculation. J Hypertens. 2016 Oct;34(10):1967. 26. 26.Kornel L, Prancan AV, Kanamarlapudi N, Hynes J, Kuzianik E. Study on the mechanisms of glucocorticoid-induced hypertension: Glucocorticoids increase transmembrane Ca2+ influx in vascular smooth muscle in vivo. Endocr Res. 1995 Jan 1;21(1–2):203–10. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3109/07435809509030436&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=7588382&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F04%2F15%2F2023.04.07.23288152.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1995QZ87200023&link_type=ISI) 27. 27.Wu X, Geng C, Sun W, Tan M. Incidence and Risk Factors of Osteonecrosis of Femoral Head in Multiple Myeloma Patients Undergoing Dexamethasone-Based Regimens. BioMed Res Int. 2020;2020:7126982. 28. 28.Zhang B, Zhang S. Corticosteroid-Induced Osteonecrosis in COVID-19: A Call For Caution. J Bone Miner Res Off J Am Soc Bone Miner Res. 2020 Sep;35(9):1828–9. 29. 29.Werner JC, Sicard RE, Hansen TW, Solomon E, Cowett RM, Oh W. Hypertrophic cardiomyopathy associated with dexamethasone therapy for bronchopulmonary dysplasia. J Pediatr. 1992 Feb;120(2 Pt 1):286–91. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S0022-3476(05)80446-9&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=1735831&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F04%2F15%2F2023.04.07.23288152.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1992HC90100025&link_type=ISI) 30. 30.Jeong Y, Han HS, Lee HD, Yang J, Jeong J, Choi MK, et al. A Pilot Study Evaluating Steroid-Induced Diabetes after Antiemetic Dexamethasone Therapy in Chemotherapy-Treated Cancer Patients. Cancer Res Treat. 2016 Oct;48(4):1429–37. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F04%2F15%2F2023.04.07.23288152.atom) 31. 31.van Raalte DH, Diamant M. Steroid diabetes: from mechanism to treatment? Neth J Med. 2014 Feb;72(2):62–72. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24659588&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2023%2F04%2F15%2F2023.04.07.23288152.atom) 32. 32.Akter F, Araf Y, Hosen MJ. Corticosteroids for COVID-19: worth it or not? Mol Biol Rep. 2022 Jan 1;49(1):567–76.