Abstract
Background From January 2020, the COVID-19 pandemic has raged around the world, causing nearly a million deaths and hundreds of severe economic crises. In this terrible scenario, Italy was one of the most affected countries.
Objective The aim of this study is to look for significant correlations between COVID-19 cases and demographic, geographical, and environmental statistics of each Italian region from February 26 to August 12, 2020. Finally, we further investigated the link between SARS-CoV-2 spread and particulate matter 2.5 and 10 concentrations before the lockdown in Lombardy.
Methods All demographic data were taken from the AdminStat Italia website, while the geographic data from the Il Meteo website. The collection frequency was 1 week. Data on PM2.5 and PM10 average daily concentrations were collected from previously published articles. We used Pearson’s coefficients to correlate quantities that followed a normal distribution, and Spearman’s coefficient to correlate quantities that did not follow a normal distribution. To evaluate this, we used the kurtosis (k) and skewness (s) coefficients according to the following scheme: we considered data compatible with a normal distribution only when tk =k · (24/n)−1/2 ≤ 1.5 and tS =s· (6/n)−1/2 ≤ 1.5; here, the Pearson correlation index was deemed more reliable. When tk ∈]1.5, 3], tS ≤3 or tk ≤3, tS ∈]1.5, 3], we considered it appropriate to evaluate both correlations. Finally, when tk, tS > 3, we judged the Spearman correlation index more appropriate. When the linear correlations were significant, we interpolated the data linearly. We reported in round brackets () the week in which the correlation approached the threshold of statistical significance e.g. Abruzzo (4). The chosen p-value threshold was α = .05.
Results We found significant strong correlations between COVID-19 cases and population number in 60.0% of regions, such as Calabria (5), Campania (1), Lazio (1), Liguria (2), Lombardy (4), Piedmont (2), Sardinia (3), Sicily (1), and Veneto (4) (R best = .935, 95% CI: .830 – 1.000, p best = .046, 95% CI: .006 - .040). The average of the angular coefficients resulting from the linear interpolations of the pairs (COVID-19 cases, population number) is b = .0037 (95% CI: .0009 − .0065). We found a significant strong correlation between the angular coefficients b of the various regions and their latitude. This data shows the dependence of COVID-19 on geographical and/or climatic factors (R = .926, p= .001, r = .886, p= .003). in particular, we found a significant correlation with the historical averages (last 30 years) of the minimum temperatures of the Italian regions (R = −. 849, p= .008, r = −. 940, p= .005 for March, R = −. 923, p= .001, r = −. 872, p = .005 for February). We found a significant strong correlation between the number of COVID-19 cases until August 12 and the average daily concentrations of PM2.5 in Lombardy until February 29, 2020 (r = .76, p= .004). No significant correlation with PM10 was found in the same periods. Until February 26, 2020, we found both a correlation with PM2.5 (r = .63, p= .029) and PM10 (r = .72, p= .009). In the second week of March, the correlation with PM10 disappeared while that with PM2.5 continued to exist until nowadays. We found that 40 µg/m3 for PM2.5 and 50 µg/m3 for PM10 are plausible thresholds beyond which particulate pollution clearly favors the spread of SARS-CoV-2.
Conclusion Since SARS-CoV-2 is correlated with historical minimum temperatures and particulate matter 10 and 2.5, health authorities are urged to monitor pollution levels and to invest in precautions for the arrival of autumn. Furthermore, we suggest creating awareness campaigns for the recirculation of air in closed places and to avoid exposure to cold.
Introduction
The COVID-19 pandemic has been declared by the World Health Organization (WHO) chief as the most severe pandemic in recent human history [1]. To date, over 200 countries have been involved, for a total of over 29 million cases and 900,000 deaths [2]. Between February and April 2020, Italy was the most affected nation both for the number of new cases and new deaths [3]. The severity of the emergency was so important that, despite a drastic drop in infections during the summer months, it is still among the top 20 nations afflicted by the novel coronavirus [4]. On January 23, two Chinese tourists tested positive for COVID-19 near Rome [5]. However, the patients appeared to have been readily isolated, averting extensive contagion. Towards the end of February, the situation began to precipitate and fall outside the control of the institutions. Starting from February 21, to counter the spread of the virus, the Italian government declared various lockdowns that extended around the outbreak of Lodi, in the region of Lombardy. On March 10, the lockdown went into effect nationwide [6]. For these reasons, we believe Italy to be one of the main sources of information useful for fully understanding the behavior of SARS-CoV-2.
This is the first study that provides a complete and detailed history of the correlations between the SARS-CoV-2 spread and the demographic, geographic, and environmental statistics in Italy. From the analysis of our results, it was possible to highlight anomalous and/or local properties and behaviors of the virus, as well as test the statistical significance of the hypotheses and scenarios proposed by other papers.
Methods
We have collected data on Italian demographic statistics and pollution from the AdminStat Italia website and a previously published article [7, 8]. We looked for significant Pearson (R) and Spearman (r) correlations between COVID-19 cases per province in each region from February 26 until August 12, 2020, and birth rate, median age, population density, death rate, old age index, population number, % of unmarried, family members, growth rate, % of divorcees, % of foreigners, foreigners growth rate, % of widowers. The frequency of data collection was 1 week. For each correlation found, we calculated the angular coefficient of the interpolating line and correlated the latter with geographical characteristics such as regions’ latitude and minimum temperatures in the months of February and March (last 30 years historical data). We have collected geographic data from the Il Meteo website [9]. Below, we list the regions on which the survey was carried out: Abruzzo, Calabria, Campania, Emilia-Romagna, Friuli Venezia Giulia, Lazio, Liguria, Lombardia, Marche, Piemonte, Sardegna, Sicilia, Toscana, Veneto (number of provinces > 3). In the results section we have reported only significant results. We reported in round brackets () the week in which the correlation approached the threshold of statistical significance e.g. Abruzzo (4). Finally, we have deepened the results of a previous study on the search for correlations between the daily average concentrations of Particulate Matter 10 and 2.5 (PM 10, PM 2.5) until February 29 and COVID-19 cases for every province in the Lombardy region from February 26 until August 12, 2020 [7].
Statistical analysis
Each result was appropriately reported together with its standard deviation (SD) and p-value (p); we chose a statistical significance threshold α ≤.05. For each sample, Kurtosis (k) and Skewness (s) were appropriately calculated using Microsoft Excel 2020 software; we used the formulas and
, with n sample size, to obtain their respective standard deviations [10]. We considered data compatible with a normal distribution only when tk =k · (24/n)−1/2 ≤ 1.5 and tS =s· (6/n)−1/2 ≤ 1.5; here, the Pearson correlation index was deemed more reliable. When tk ∈]1.5, 3], tS ≤3 or tk ≤3, tS ∈]1.5, 3], we considered it appropriate to evaluate both correlations. Finally, when tk, tS > 3, we judged the Spearman correlation index more appropriate. When we highlighted linear correlations, we used the Igor Pro 6.37 software to interpolate the data through the equation y= a + bx. In the results section, we have reported the mean values R best, r best of the correlations found, with their 95% confidence interval. When the coefficients exceeded the value. 700 with p≫ .05, they were reported specifying the absence of statistical significance. To verify the importance of the correlation found, we have constructed a suitable correlation matrix; we have defined pure those correlations independent from other quantities correlated with COVID-19.
Results
COVID-19 cases – population number correlation
We found significant strong correlations between COVID-19 cases and population number in 60.0% of regions, such as Calabria (5), Campania (1), Lazio (1), Liguria (2), Lombardy (4), Piedmont (2), Sardinia (3), Sicily (1), and Veneto (4) (R best = .935, 95% CI: .830 – 1.000, p best = .046, 95% CI : .006 - .040). Suspicious correlations have been found in Emilia Romagna, Marche, and Puglia (table 1). Since correlations have occurred in many regions from the first weeks, it is likely that the virus had spread as early as January; on the contrary, the Veneto and the most affected Lombardy seem to have experienced a gradual contagion. The average of the angular coefficients resulting from the linear interpolations of the pairs (COVID-19 cases, population number) is b = .0037 (95% CI: .0009, .0065).
Pearson and Spearman correlations between COVID-19 cases and population density and between COVID-19 cases and population numbers.
COVID-19 spread speed – latitude and minimum temperatures correlation
We found a significant strong correlation between the angular coefficients b of the various regions and their latitude (table 2). This data indicates the dependence of COVID-19 on geographical and/or climatic factors (R = .895, p< .0001, r = .874, p< .0001); in particular, we found a significant correlation with the historical averages (last 30 years) of the minimum temperatures of the Italian regions (R = −. 576, p= .039; r = −. 629, p = .021 for March, R = −. 685, p= .010, r = −. 615, p= .025 for February). By narrowing the analysis to the regions that showed a net correlation, we obtained much stronger correlations (R = −. 849, p= .008, r = −. 940, p = .005 for March, R = −. 923, p= .001, r = −. 872, p= .005 for February). The same happens for the correlation with latitude (R = .926, p= .001, r = .886, p = .003).
Geographical and environmental data of the regions in which a correlation was found (including suspicious ones).
COVID-19 cases – PM10 and PM2.5 daily averages correlation (only Lombardy)
We have identified a strong significant correlation between the number of COVID-19 cases until August 12 and the average daily concentrations of PM2.5 in Lombardy until February 29, 2020 (r=.76, p=.004). No significant correlation with PM10 was found in the same periods. Therefore, in the long run, the correlation with PM2.5 was more statistically incisive than that with PM10. However, in the early stages of the outbreak (until February 26, 2020), we found both a correlation with PM2.5 (r = .63, p= .029) and PM10 (r = .72, p= .009). In the second week of March, the correlation with PM10 disappeared while that with PM2.5 continued to exist until nowadays. We have identified a drastic surge in COVID-19 cases near 40 µg/m3 for PM2.5 and 50 µg/m3 for PM10 (figure 1); therefore, these may be the thresholds beyond which particulate pollution clearly favors the spread of SARS-CoV-2. All the correlations found are statistically valid as they are not related to the other quantities analyzed.
COVID-19 cases – PM10 and PM2.5 scatterplot evolution from February 26 to March 18, 2020.
COVID-19 cases - population density correlation
We found significant strong correlations between COVID-19 cases and population density in 33.3% of regions, such as Abruzzo (4), Campania (1), Lazio (1), Sicily (2), and Veneto (1) (Rbest = .880, 95% CI:.757 – 1.000, pbest = .023, 95% CI:.006 − .040). Suspicious correlations have been found in Piedmont and Liguria (table 1). The only pure correlation was that in Abruzzo, since in all other cases there was a correlation between the population density and number (Rbest = .939, 95% CI:.880 – 997, pbest = .003, 95% CI:.001 - .006); however, we highlighted discrepancies between the onset of the above in Sicily and in Veneto.
Plausible scenarios
The results found are compatible with the following scenarios:
The almost immediate correlations between COVID-19 cases and the number of inhabitants per province in Campania, Lazio, Sicily, Liguria, and Piedmont, strongly indicate that SARS-CoV-2 was in circulation for a long time before the first confirmed case. Observing the Lombardy trend, we deduced the virus seems to have taken 4 weeks to correlate with the demographic dimension; therefore, it is plausible that COVID-19 spread in Italy from January 2020 (or earlier).
Low temperatures can weaken the immune defenses, favoring the contagion from novel coronavirus.
Low temperatures can push people to create gatherings indoors and without air circulation, favoring the spread of the novel coronavirus.
Low temperatures could promote the survival of the virus.
PM2.5 can weaken the immune defenses, favoring the contagion from novel coronavirus.
PM2.5 can serve as an excellent carrier for the novel coronavirus.
High concentrations of PM10 may have contributed to the spread of the virus by acting as a carrier. This is supported not only by the initial correlation with PM10 but also by the extremely high PM10 concentration found in the first outbreak in Lodi (67 µg/m3).
It is possible that SARS-CoV-2 has carried out evolutionary mutations in northern Italy (particularly in Lombardy). This agrees with the possibility that COVID-19 was spread nationwide, mistaken for flu or common colds until the aforementioned mutation occurred.
Discussion
The first two confirmed cases of COVID-19 in Italy were two Chinese tourists, who landed on January 19, 2020 [5]. They could travel freely around the town, making a brief stop in Parma and staying in a Hotel in Rome [11]. In any case, we believe that it is statistically unlikely and morally incorrect to associate the arrival of the novel coronavirus in Italy with them. In particular, the results found in this paper suggest that SARS-CoV-2 has been circulating in Italy since early January, probably mistaken for flu or common cold. In fact, having ascertained that:
those suffering from COVID-19 were largely asymptomatic [12]
Symptoms associated with COVID-19 are milder in children compared with adults [13]
the incubation period ranges from 2 to 14 days with an average of 5-6 days [14]
80% of people with COVID-19 have mild symptoms [15]
the fatality rate is proportional to age and extremely high for patients over 65 [18]
the most likely hypothesis is that the virus had been around since before their arrival. All these factors drastically increase the time needed to identify and isolate a COVID-19 infected person since: a) the asymptomatic have infected without their knowledge and the people with whom they have been in contact; b) the children (besides the problem of asymptomaticity) showed milder symptoms than adults, inducing parents and relatives to associate them with normal colds or flu; c) the symptoms were identified up to 14 days after contagion, allowing infected subjects to infect other people unknowingly; d) in the overwhelming majority of cases, symptomatic patients showed mild symptoms not causing concern in work colleagues and relatives; e) the fact that the death risk was extremely concentrated in the high age groups prevented an easy assessment of the extent of the epidemic since these groups were naturally more exposed to fatal phenomena ie, until high numbers have been reached, the collective psychological impact was low. The marked correlation between the virus spreading speed among the population and the minimum temperatures of each region suggests that the late autumn and winter seasons can strongly favor the pandemic. In fact, due to low external temperatures, people more frequently give rise to gatherings without recirculation of air, which create favorable conditions for the proliferation of viruses [19]. Just as the rhinoviruses, adenoviruses, and influenzaviruses, the novel coronavirus may also survive better in colder and drier climates [20-23]. Furthermore, sudden changes in temperature and cold can lower people’s immune defenses [22]. The strong and prolonged correlations we have found with fine particulate matter 2.5 in the Lombardy region, in northern Italy, indicate that this type of pollution played an important role in the epidemy. This may be linked to the fact that fine particles substantially reduce immune defenses as well as increase the severity of symptoms due to damage induced in the respiratory system [24, 25]; moreover, PM2.5 (and PM10) could act as a virus carrier [26]. In fact, from Figure 1 it is clearly visible how, exceeding approximately 40 µg/m3 and 50 µg/m3 of PM2.5 and PM10 respectively, there is a drastic increase in the number of COVID-19 cases; therefore, these could be the thresholds beyond which particulate matter significantly favors the spread of SARS-CoV-2. This hypothesis, besides being supported by empirical data and other studies, is consistent with the first outbreak in Lodi, where the PM10 average daily concentration was the highest of the month (67 µg/m3) [7, 27]. However, the results we obtained show a greater incisiveness of PM2.5, compared to PM10, in the spread of the virus in Lombardy (figure 1). Finally, considering that, in Lombardy, the correlation between COVID-19 cases and the number of inhabitants per region became significant after 4 weeks, the severity of symptoms was more severe than in other regions, and the basic reproduction number (R0) seems to have been the higher one, we suggest that an evolutionary genetic mutation may have occurred in Lombardy [7]. In fact, although the genome does not appear to have changed substantially, Zhang et al. showed that even small mutations can cause significant changes in SARS-CoV-2 behavior [28].
Limitations
Statistical correlations can provide valid supports for hypotheses and theories as well as fundamental indicators of phenomena to be explored; however, they cannot demonstrate the causal nature of a phenomenon.
Conclusion
In this paper we found significant strong and lasting correlations between the spread of SARS-CoV-2 and the number of inhabitants of each region, between the spread-speed of COVID-19 and the historical minimum temperatures in the months of February and March, and between the number of COVID-19 cases and the averages daily concentrations of fine particulate matter 2.5. Correlations with the average daily concentrations of particulate matter 10 were found up to the first week of March, indicating that this type of pollution also played a role in the spread of the virus, linked to the exceeding of specific daily peaks. Therefore, we suggest to the health authorities to pay particular attention to the arrival of the winter months, not only by investing in adequate precautions but by launching various awareness campaigns on air recirculation indoors and in avoiding exposure to the cold. In addition, it will also be necessary to monitor carefully particulate matter levels 10 and 2.5, even imposing car blocks if necessary.
Data Availability
All the data necessary for carrying out this study are presented in the paper or in the articles reported in the references.