Abstract
Understanding COVID-19 and its risk factors in the Portuguese population is critical to the struggle against this infectious disease. To study the impact of multimorbidity in the population with COVID-19 infection, we performed a descriptive analysis of a dataset extracted from all reported confirmed cases of COVID-19 in Portugal until June 30, 2020. We observed a prevalence of multimorbidity in 6.77% of the 36,244 infected patients. These patients showed an increased risk of hospitalization, ICU admission, and mortality with OR 2.22 (CI 95%: 2.13-2.32) for every additional morbidity. Further studies should confirm these findings and special attention should be made on data collection, to ensure proper recording of patient comorbidities.
1 Introduction
COVID-19 is an infectious disease caused by SARS-CoV-2 that recently started challenging health systems. Its impact is global with 21,200,000 cases and 761,000 deaths by August 16, 2020 [1]. Early evidence from the pandemic suggested that older patients with chronic conditions were over-represented and may herald a poor clinical course. However, data supporting these analysis is still largely limited to China and Italy [2, 3]. Given that multimorbidity increases with age, the specific risk of different chronic conditions and their combination in terms of poor health outcomes needs to be adjusted for age, which is not routinely done [4]. Understanding which groups are at higher risk is important to better inform public health policies and resource allocation, and to advance knowledge about this novel condition.
DGS (Directionate-General of Health), the Portuguese public health authority, developed and operates SINAVE (National Epidemiological Surveillance System), i.e. the national public health surveillance system. SINAVE is used to collect, update, analyze, and disclose data related to infectious diseases with mandatory reporting, and other public health hazards [5]. Symptoms, previous medical history, disease course, and laboratory results are introduced by physicians in charge of COVID-19 patients. Although SINAVE was not specifically designed for this outbreak, it has been used since the beginning of the pandemic as the main official source of data about COVID-19 cases in Portugal.
In this work, we evaluate the prevalence of multimorbidity and age-adjusted risk of hospitalization, ICU admission, and death, in the Portuguese population from official data, based on a dataset2 extracted from SINAVE containing all confirmed cases of COVID-19 infection, in Portugal, by June 30, 2020. Our analysis is based on the DGS/SINAVE dataset spanning the full period (i.e., the June version of the dataset, which is the most recent at the time of this report), which updates an initial (April) version of the dataset based on cases reported until April 28, 2020, adding two more months and providing more data about the initial cases.
2 Methods
2.1 Sample
Our retrospective observational study used data provided by DGS after the required institutional and ethical approvals. The sample population consists of all the Portuguese population with SARS-CoV2 confirmed infection as notified by clinicians by June 30, 2020. A broad range of clinical and demographic variables are present in this dataset. In our study, we specifically used variables corresponding to age, gender, hospital admission, admission in intensive care unit, mortality, and patient’s underlying conditions.
2.2 Measures
Chronic conditions were originally provided in the dataset as categorical variables on the presence, absence, or unknown status of the following conditions: asthma, malignancy, chronic hematological disorder, diabetes, HIV/other immune deficiency, renal disease, liver disease, chronic lung disease, and neuromuscular disorder.
A field containing raw textual input from doctors was also taken into account, to better complement the cases where the chronic conditions were left as unknown. This alternative information was indeed very useful, particularly on what regards cardiac disorders (including hypertension and other cardiovascular diseases), which were not included in the dataset as a categorical variable and could, therefore, not be detected if not for the raw input.
A text mining script, using keywords associated with all the previously mentioned conditions, was used in order to better capture the prevalence of the diseases (i.e., in many cases, we noticed that although the categorical variables corresponding to chronic conditions were left with an unknown status, the textual inputs contained relevant mentions to chronic conditions) and to detect cases of cardiac disorders. The following keywords (in Portuguese) were used, in connection to each of the diseases that was considered in our study:
Asthma: asma
Malignancy: neo, cancro, carcinoma, linfoma
Cardiac disorder (including hypertension and other cardiovascular diseases): cardio, cárdio, miocar, cardía, cardia, hta, auricular, arterial, venosa
Chronic hematological disorder: hematológica
Diabetes: diabetes, DM
HIV/other immune deficiency: hiv, vih
Renal disease: renal
Liver disease: hepatomegalia
Chronic lung disease: dpoc, pulmonar
Neuromuscular disorder: alz, parkinson, epilepsia.
The keywords were chosen in order to cover different cases, considering misspellings or abbreviations. For instance, neoplasia was commonly abbreviated to neo and the latter was therefore preferred (and also matched as a prefix). Another example is the use of the prefix alz to detect Alzheimer’s disease, that was often misspelled as Alzaimer.
The contribution of using the raw textual input when, preparing the data, was also analyzed. All the statistics reported on this article were also measured in a version of the data that only considered the structured information. The main difference, as stated above, was the complete absence of any information on cardiac disorders, when not considering the textual information. There were also some differences in the prevalence of the other chronic conditions, although only reaching up to 0.16%.
We defined multimorbidity as the presence of two or more conditions in the same individual, following the definition used by other authors [6]. As will be further discussed on the next section, multimorbidity was present on 6.77% of the cases in the dataset, when considering the textual inputs, while this number would be reduced to 4.01% if only considering the structured information. Addressed outcomes were hospitalization, admission to ICU unit, and reported death. A composite outcome of any of these events was also analyzed.
2.3 Statistical analysis
We used Python 3.6 and packages Numpy, Pandas, Seaborn, and SciPy, in combination with Microsoft Excel to evaluate and plot the prevalence of multimorbidity. Additionally, IBM SPSS Statistics was used for obtaining age-adjusted risk of hospitalization, ICU admission, and death.
Categorical variables were presented to analysis as counts and percentages with 95% confidence intervals, and comparisons were made using using the χ2 test. Univariate regression analysis of each individual chronic condition was performed adjusting only for age. A multivariate logistic regression was performed adjusting for age and every other chronic condition with significant statistical association on univariate analysis. Results were considered statistically significant when P <0,05.
3 Results
The overall sample contained 36,244 adult patient cases, with women being more prevalent (56.66%). Among the cases, 18.79% had at least one chronic condition. Cardiac disorder was the most commonly reported condition, respectively in 43.33% of the patients with any morbidity. Table 1 shows the reported prevalence of different chronic conditions in the studied population. Multimorbidity, as previously defined, was present in 6.77% of the cases. Figures 1 and 2 plot the prevalence of multimorbidity by age group, respectively for (a) the COVID-19 infected general population, and for (b) the hospitalized population. To analyze the Odd Ratio and prevalence of co-occurring pairs of chronic diseases, people with unknown disease prevalence were excluded, which resulted in a population of 33,283 adult patients. The prevalence of co-occurring pairs of chronic health conditions, plotted in Figure 4, shows Cardiac disorders and Diabetes as the most common dyad of chronic diseases.
Percentage of COVID-19 infected total Portuguese population affected by each comorbidity.
Odds Ratio for the outcome (Death) for the Age variable and analyzed comorbidities. The real value of Age was used instead of an age group. Being a continuous variable, an Odds Ratio of 1.097 implies that, as the Age variable increases by 1, the probability of the patient having the outcome increases by 9.7%. B, S.E, and Wald are the unstandardized regression weight, how much the unstandardized regression weight can vary by, and test statistic for the individual predictor variable, respectively.
Odds Ratio for the outcome (Hospitalization) for the Age variable and analyzed comorbidities. The real value of Age was used instead of an age group. Being a continuous variable, an Odds Ratio of 1.056 implies that, as the Age variable increases by 1, the probability of the patient having the outcome increases by 5.6%. B, S.E, and Wald are the unstandardized regression weight, how much the unstandardized regression weight can vary by, and test statistic for the individual predictor variable, respectively.
Odds Ratio for the outcome (ICU stay) for the Age variable and analyzed comorbidities. The real value of Age was used instead of an age group. Being a continuous variable, an Odds Ratio of 1.052 implies that, as the Age variable increases by 1, the probability of the patient having the outcome increases by 5.2. B, S.E, and Wald are the unstandardized regression weight, how much the unstandardized regression weight can vary by, and test statistic for the individual predictor variable, respectively.
Odds Ratio for the outcome (Death + hospitalization + ICU stay) for the Age variable and analyzed comorbidities. The real value of Age was used instead of an age group. Being a continuous variable, an Odds Ratio of 1.061 implies that, as the Age variable increases by 1, the probability of the patient having the outcome increases by 6.1%. B, S.E, and Wald are the unstandardized regression weight, how much the unstandardized regression weight can vary by, and test statistic for the individual predictor variable, respectively.
Prevalence of multimorbidity by age group for the COVID-19 infected Portuguese population. The lighter shade of blue is representative of the absence of conditions and the black line represents the prevalence of multimorbidity.
Prevalence of multimorbidity by age group for the COVID-19 infected Portuguese hospitalized population. The lighter shade of blue is representative of the absence of conditions and the black line represents the prevalence of multimorbidity.
Prevalence of multimorbidity by age group using data from the Portuguese Fifth National Health Interview Survey (Laires et al. 2019). The lighter shade of blue is representative of the absence of conditions and the black line represents the prevalence of multimorbidity.
Prevalence of single (left) and co-occurring pairs (right) of chronic health conditions. Left: Prevalence of the disease in the Portuguese population with at least one disease. Right: Prevalence of the disease, rows, in the Portuguese population affected by another disease, columns. All values are presented in percentage.
Data regarding hospitalization and ICU admission was available for only 32,945 patients (90.90% of the overall study population). Within this population, hospitalization occurred in 12.89% of the patients, with a male predominance (50.66%), and ICU admission was required for 4.11% of the patients, with a female predominance (51.73%). Observed mortality was 3.19%. All chronic conditions, except for asthma, were associated with increased risk of mortality and hospitalization (Tables 6 and 7, respectively). Age, diabetes, renal disease, lung disease, and neuromuscular disorders, were all associated with increased risk of ICU admission (Table 8). Additionally, every additional chronic condition increases the risk for the patients of the composite outcome of death, hospitalization, or ICU admission, by 123.3% (OR 2.22; CI95%: 2.13-2.32).
Odds Ratio for the association between the age, categories of comorbidity and outcome (Death) in patients with COVID-19. The continuous value of Age was used instead of an age group. Being a continuous variable, an Odds Ratio of 1.093 implies that, as the Age variable increases by 1, the probability of the patient having the outcome increases by 9.3%. B, S.E, and Wald are the unstandardized regression weight, how much the unstandardized regression weight can vary by, and test statistic for the individual predictor variable, respectively.
Odds Ratio for the association between the age, categories of comorbidity and outcome (hospitalization) in patients with COVID-19. The continuous value of Age was used instead of an age group. Being a continuous variable, an Odds Ratio of 1.045 implies that, as the Age variable increases by 1, the probability of the patient having the outcome increases by 4.5%. B, S.E, and Wald are the unstandardized regression weight, how much the unstandardized regression weight can vary by, and test statistic for the individual predictor variable, respectively.
Odds Ratio for the association between the age, categories of comorbidity and outcome (ICU stay) in patients with COVID-19. The continuous value of Age was used instead of an age group. Being a continuous variable, an Odds Ratio of 1.050 implies that, as the Age variable increases by 1, the probability of the patient having the outcome increases by 5.0%. B, S.E, and Wald are the unstandardized regression weight, how much the unstandardized regression weight can vary by, and test statistic for the individual predictor variable, respectively.
Odds Ratio for the association between the age, categories of comorbidity and outcome (Death + hospitalization + ICU stay) in patients with COVID-19. The continuous value of Age was used instead of an age group. Being a continuous variable, an Odds Ratio of 1.050 implies that, as the Age variable increases by 1, the probability of the patient having the outcome increases by 5.0%. B, S.E, and Wald are the unstandardized regression weight, how much the unstandardized regression weight can vary by, and test statistic for the individual predictor variable, respectively.
Odds Ratio for the association between the age, number of comorbidities and outcome (Death + Hospitalization + ICU stay) in patients with COVID-19. Both Age and No of comorbidities are continuous variables, i.e, an Odds Ratio of 2.223 implies that, as the No of comorbidities increases by 1, the probability of the patient having the outcome increases by 122.3%. B, S.E, and Wald are the unstandardized regression weight, how much the unstandardized regression weight can vary by, and test statistic for the individual predictor variable, respectively.
When comparing the two versions of the dataset that were released by the DGS, it is possible to state that the June version is more complete and representative than the April one. This is easily verified by the prevalence of cardiac disorders in the total population, with a value of 0.28% in the April version and that rose to 8.14% in the most recent one (i.e., in the April version of the dataset all chronic diseases, including cardiac disorders, were encoded as a single categorical variable, and we believe that the information on cardiac disorders was also being extracted from textual notes by the DGS, given that the SINAVE system does not feature a structured field for collecting this information). Concerning chronic diseases and multimorbidity prevalence, the first version displayed values of 16.39% and 4.49%, respectively. With respect to Odd Ratios, the June version mainly showed differences in higher ratios for cardiac disorders. In the April version, due to the small sample size of patients with cardiac disorders, some Odd Ratios got as high as 91.72 (CI 95%: 28.61-294.06), as was the case for the composite outcome of death, hospitalization, or ICU admission (see Table 5).
As mentioned before, the way by which the structured information on chronic diseases was encoded changed amidst the two versions of the dataset. Besides the difference in terms of encoding the data as a single versus multiple categorical variables, the most recent version makes a distinction between present, not present, and unknown, while the April version listed the conditions that were present or not present. This difference can influence the veracity of the results, and we believe the more recent version can better depict the population affected by the different diseases and outcomes.
4 Discussion
Our study shows that multimorbidity is significantly associated with adverse outcomes for COVID-19 infection in the Portuguese population, independently from age. All chronic conditions, except asthma, lead to increased risk of hospitalization. However, only diabetes, chronic kidney disease, chronic respiratory diseases, and neuromuscular disorders, are associated with more severe cases requiring ICU admission. These results are in line with previous reports where chronic diseases are associated with poorer outcomes [3, 7]. Although the strength of association differs between diseases, every additional morbidity leads to an increased risk of the composite outcome of hospitalization, ICU admission, and mortality.
Multimorbidity was previously studied by Laires et al. for the general Portuguese population in 2014, using data from the fifth National Health Interview Survey (Inquérito National de Saude, INS) [6]. Prevalence of multimorbidity on the study from Laires at al. can be compared with our data. Although only individuals aged 25-79 were included in the study from 2014, the choice of individuals constitutes a robust sample for the study of morbidity prevalence in Portugal. We can observe in Figures 1 to 3 a rise in chronic health conditions with increasing age, as expected. However, multimorbidity is much less prevalent in our study population (6.77% vs 43.9%). Since the beginning of the COVID-19 pandemic there has been significant public awareness regarding the higher risk of older people with comorbidities. This may have induced efforts to protect and isolate this population, with the findings of our study suggesting a positive effect of such measures, given the younger and healthier population in the COVID-19 dataset. The discrepancy may however be related to different reporting methods. For instance, the maximum number of significant reported simultaneous morbidities in the COVID-19 infected population was 5 disorders (there were a total of 5 people with simultaneous morbidities ranging from 6 to 8), which is half of the maximum number of morbidities found in the INS population (10 disorders). Since the total number of conditions considered in both datasets is not so different (COVID-19: 10 diseases; INS: 13 diseases), a possible explanation for the higher number of co-occurring conditions in the INS population can be the combination of self-diagnoses with the presence of more subjective disorders, such as lower and upper back pain, allergies, depression, and urinary incontinence.
Our study also has several important limitations. First of all, the cross-sectional nature of the COVID-19 dataset makes it impossible to account for incomplete outcomes, since several patients could ultimately be hospitalized or die after the end of observation. Reported data on outcomes may therefore be underestimated, so careful interpretation is advised until more data is available. More importantly, despite the fact that no standard set of conditions is established to define multimorbidity, chronic conditions were given on broad groups and there is no specific information on individual conditions. For example, diabetes is given as one group and no distinction is made between type 1 and type 2 diabetes. Therefore, measured morbidities may herald heterogeneous groups of diseases with different degrees of severity, which may influence outcomes. Future datasets should ideally include more accurate information on chronic conditions.
Another important concern is related to the risk of under-reporting, which becomes obvious by analyzing reported cardiac diseases. Given that cardiovascular diseases, particularly hypertension, are very prevalent in the Portuguese population [8], the observed prevalence of 8.14% in our study highly suggests that under-reporting may have occurred. In addition, the prevalence of reported cardiac diseases has significantly increased from the April version of the DGS dataset, which showed a much lower cardiac disease prevalence in the general population of 0.28%. Surprisingly, cardiovascular diseases are absent from the available list of previous conditions in SlNAVE’s reporting page, which could have contributed to a lower notification of this comorbidity (see Figure 6b). One explanation to the considerable increase in cardiac disorder prevalence is the fact that the raw textual input from doctors was included in the June version of the DGS/SINAVE dataset and used in our analysis through the consideration of keywords, instead of relying on the categorical variable for the presence or absence of this disease, as in the case of the April version.
Percentage of observed and expected prevalence of co-occurring pairs of chronic health conditions (data from the Portuguese Fifth National Health Interview Survey) [6]. The shaded bar depicts the prevalence of each chronic health condition. In the matrix, the first value for each pair is the observed frequency, while the second (italic) is the expected one after multiplying the respective prevalence of each disorder. Chi-square tests were used to determine whether observed frequencies were significantly different from expected frequencies. All p-values were inferior to 5%.) LBP low back pain, HTA hypertension, UBP upper back pain, UI urinary incontinence, COPD chronic obstructive pulmonary disease, CKD chronic kidney disease, CHD coronary heart disease, MI previous myocardial infarction, and LC liver cirrhosis
Screenshots of SINAVE’s interface regarding the report of known comorbidities: (a) Reporting the presence of comorbidities; (b) Reporting the known comorbidities
Although we acknowledge that the DGS/SINAVE dataset was not primarily generated for research, but rather for public health proceedings and government information, we believe that a better user interface design and a more rational set of chronic conditions could effortlessly improve the quality of the recorded data. One important lesson that we can learn from this pandemic is the important contribution that quickly gathered relevant clinical data through effective health information systems, such as SINAVE, can have in this setting.
We indeed suspect SINAVE to be very susceptible to under-reporting, specially regarding multimorbidity. The user interface is somewhat confusing and non-practical, allowing doctors to skip filling some important fields in the entry form. Some suggestions to improve future studies regarding multimorbidity, using SINAVE’s data, include:
Redesign the reporting form, making data entry more effective and faster. The current user interface could be simplified, while encouraging the input of relevant comorbidity information
Integrate SINAVE data with the patient’s health record or with data from the new Trace COVID-19 system, to provide richer data relevant to COVID-19. This could be a way of implementing the redesigned user interface suggested above
Make it mandatory to input if comorbidities are absent; if present, the filling of the entry form inputs related to comorbidities and the specific chronic conditions should be mandatory. As seen in Figure 6, the list of comorbidities only becomes visible if a previous parameter is filled
Emphasize to healthcare professionals the importance of entering all known chronic conditions in the system
Add cardiovascular diseases to the list of comorbidities.
To conclude, findings in our study show that multimorbidity is significantly associated with poor outcomes in COVID-19 infection. Further data is needed to inform about the strength of this association and about the significance of observed differences in multimorbidity prevalence between infected patients and the general population of Portugal. We believe that data collection problems may have occurred and influenced outcome measurement. We also provide recommendations for improving the data collection user interface, that could ultimately improve quality of health information about the COVID-19 infected population, while increasing confidence in SINAVE data.
Data Availability
This retrospective observational study used data provided by DGS after the required institutional and ethical approvals. The sample population consists of all the Portuguese population with SARS-CoV2 confirmed infection as notified by clinicians by June 30, 2020. A broad range of clinical and demographic variables are present in this dataset. In this study we specifically used age, gender, hospital admission, admission in intensive care unit, mortality and patient's underlying conditions.
List of Tables
Percentage of COVID-19 infected total Portuguese population affected by each comorbidity 11
Odds Ratio for the outcome (Death) for the Age variable and analyzed comorbidities. The real value of Age was used instead of an age group. Being a continuous variable, an Odds Ratio of 1.097 implies that, as the Age variable increases by 1, the probability of the patient having the outcome increases by 9 .7%. B, S.E, and Wald are the unstandardized regression weight, how much the unstandardized regression weight can vary by, and test statistic for the individual predictor variable, respectively. 12
Odds Ratio for the outcome (Hospitalization) for the Age variable and analyzed comorbidities. The real value of Age was used instead of an age group. Being a continuous variable, an Odds Ratio of 1.056 implies that, as the Age variable increases by 1, the probability of the patient having the outcome increases by 5.6%. B, S.E, and Wald are the unstandardized regression weight, how much the unstandardized regression weight can vary by, and test statistic for the individual predictor variable, respectively. 13
Odds Ratio for the outcome (ICU stay) for the Age variable and analyzed comorbidities. The real value of Age was used instead of an age group. Being a continuous variable, an Odds Ratio of 1.052 implies that, as the Age variable increases by 1, the probability of the patient having the outcome increases by 5.2. B, S.E, and Wald are the unstandardized regression weight, how much the unstandardized regression weight can vary by, and test statistic for the individual predictor variable, respectively. 14
Odds Ratio for the outcome (Death + hospitalization + ICU stay) for the Age variable and analyzed comorbidities. The real value of Age was used instead of an age group. Being a continuous variable, an Odds Ratio of 1.061 implies that, as the Age variable increases by 1, the probability of the patient having the outcome increases by 6.1%. B, S.E, and Wald are the unstandardized regression weight, how much the unstandardized regression weight can vary by, and test statistic for the individual predictor variable, respectively. 15
Odds Ratio for the association between the age, categories of comorbidity and outcome (Death) in patients with COVID-19. The continuous value of Age was used instead of an age group. Being a continuous variable, an Odds Ratio of 1.093 implies that, as the Age variable increases by 1, the probability of the patient having the outcome increases by 9.3%. B, S.E, and Wald are the unstandardized regression weight, how much the unstandardized regression weight can vary by, and test statistic for the individual predictor variable, respectively. 16
Odds Ratio for the association between the age, categories of comorbidity and outcome (hospitalization) in patients with COVID-19. The continuous value of Age was used instead of an age group. Being a continuous variable, an Odds Ratio of 1.045 implies that, as the Age variable increases by 1, the probability of the patient having the outcome increases by 4.5%. B, S.E, and Wald are the unstandardized regression weight, how much the unstandardized regression weight can vary by, and test statistic for the individual predictor variable, respectively. 17
Odds Ratio for the association between the age, categories of comorbidity and outcome (ICU stay) in patients with COVID-19. The continuous value of Age was used instead of an age group. Being a continuous variable, an Odds Ratio of 1.050 implies that, as the Age variable increases by 1, the probability of the patient having the outcome increases by 5.0%. B, S.E, and Wald are the unstandardized regression weight, how much the unstandardized regression weight can vary by, and test statistic for the individual predictor variable, respectively. 18
Odds Ratio for the association between the age, categories of comorbidity and outcome (Death + hospitalization + ICU stay) in patients with COVID-19. The continuous value of Age was used instead of an age group. Being a continuous variable, an Odds Ratio of 1.050 implies that, as the Age variable increases by 1, the probability of the patient having the outcome increases by 5.0%. B, S.E, and Wald are the unstandardized regression weight, how much the unstandardized regression weight can vary by, and test statistic for the individual predictor variable, respectively. 19
Odds Ratio for the association between the age, number of comorbidities and outcome (Death + Hospitalization + ICU stay) in patients with COVID-19. Both Age and No of comorbidities are continuous variables, i.e, an Odds Ratio of 2.223 implies that, as the No of comorbidities increases by 1, the probability of the patient having the outcome increases by 122.3%. B, S.E, and Wald are the unstandardized regression weight, how much the unstandardized regression weight can vary by, and test statistic for the individual predictor variable, respectively. 20
- Abbreviations
- SINAVE
- Sistema Nacional de Vigilância Epidemiológica
- COVID-19
- Coronavirus Disease 2019
- SARS-CoV-2
- Severe Acute Respiratory Syndrome Coronavirus 2, Multimorbidity, Chronic Conditions
Acknowledgements
This research was partially supported through the European Union’s Horizon 2020 research and innovation programme under grant agreement No 874850 (MOOD), as well as through the Fundação para a Ciência e Tecnologia (FCT) under the project grant with reference LISBOA-01-0247-FEDER-045948 (IntelligentCare), and through the INESC-ID multi-annual funding from the PIDDAC programme (UIDB/50021/2020).
Footnotes
Figures label revised; and Section 2 updated