Abstract
Objectives The emergence of new SARS-CoV-2 variants is a major challenge in the management of Covid-19 pandemic. A crucial issue is to quantify the number of variants which may represent a potential risk for public health in the future.
Methods We fitted the data on the most relevant SARS-CoV-2 variants recorded by the World Health Organization (WHO). The function exploited for the fit is related to the total number of infected subjects in the world since the start of the epidemic.
Results We found that the relevant SARS-CoV-2 variants up to November 2021 were about 34. Moreover, the number of new relevant variants per ten million cases turned out to be 1.25 in November 2021, slightly decreased in comparison to the value of 1.75 in March 2020.
Conclusions Our simple mathematical model can evaluate the number of relevant SARS-CoV-2 variants as the cumulative number of cases increases worldwide and may represent a useful tool in planning strategies to effectively contrast the pandemic.
Introduction
Most mutations in the genome of the severe acute respiratory syndrome coronavirus (SARS-CoV-2) are neutral or only mildly deleterious. However, a small proportion of mutations can increase infectivity and promote virus-host interactions that are critical to the establishment of persistent and more severe infection [1, 2]. For example, mutations in the spike protein, which mediates attachment of the virus to host cell-surface receptors [3], can have significant effects on virus behaviour. In order to effectively control the pandemic, it is imperative to investigate the emergence and spread of variants with an impact on disease transmission and human health [4].
SARS-CoV-2 sequences are shared daily on public databases such as the Global Initiative on Sharing All Influenza Data (GISAID) [5] or the European Centre for Disease Prevention and Control (ECDC) [6], which significantly contribute to surveillance of the pandemic.
The World Health Organization has established that SARS-CoV-2 variants representing a possible risk to public health can be divided into three distinct categories [1]: variants under monitoring (VUMs), variants of interest (VOIs) and variants of concern (VOCs).
Variants Under Monitoring (VUMs) are associated with genetic mutations which alter virus characteristics, although evidence of phenotypic or epidemiological impact is still unclear.
Variants of Interest (VOIs) are associated with: i) genetic mutations which affect transmissibility, disease course, diagnostic or therapeutic escape; ii) relevant community transmission with an emerging risk to global public health.
Variants of Concern (VOCs) are associated with one or more of the following characteristics: i) increase in transmissibility; ii) increase in virulence or change in disease severity; iii) decrease in effectiveness of social measures, diagnostics, vaccines and therapeutics.
Given the continuous evolution of the SARS-CoV-2 virus, variants may be reclassified over time.
In the present study, we fitted the WHO data [1, 2] by exploiting a function which exclusively depends on the number of infected cases worldwide. Our fit allows for a fairly good estimate of the number of relevant variants that can be expected to appear for a given number of infected subjects throughout the world.
Furthermore, our approach can predict the number of new relevant variants per ten million cases in any epidemiological situation.
The number of new relevant variants per ten million cases decreases very slowly as the cumulative number of Covid-19 cases increases. Therefore, it becomes crucial to carefully monitor and reduce virus circulation in order to avoid the emergence of new variants which may not be suitably covered by the vaccines and drugs currently available [7].
Despite the huge efforts put forth by healthcare services in many countries, vaccination campaigns have not achieved a population coverage that is high enough to prevent the spread of SARS-CoV-2. The WHO estimate [8] is that in Europe and Central Asia alone the current fourth wave of the Covid-19 pandemic is likely to cause more than half a million deaths.
Although it is obvious that the number of relevant variants increases with the cases throughout the world, it is extremely difficult to find out the precise relationship between these two variables. Our model is a simple attempt to make a fairly reliable estimate of the risk of new variants that can impact public health as the virus continues to spread.
Methods
By means of Wolfram Mathematica [9] we fitted the data on SARS-CoV-2 variants in order to evaluate the cumulative number v of relevant SARS-CoV-2 variants versus the cumulative number N of infected subjects worldwide.
The function v fitting the WHO data must satisfy the following conditions:
The function v varies from zero to infinity. If there is no infection, the number of variants is zero; vice versa, if the virus replicates infinite times, the cumulative number of variants is also infinite.
The function v increases monotonically, therefore the first derivative v′ (N) is positive. The cumulative number of variants increases with the number of infections, i.e. as the virus replications increase.
The first derivative of v decreases monotonically, therefore the second derivative v″ (N) is negative.
As the cumulative number of variants increases with the total cases in the world, the emergence of new virus mutations turns out to be slightly less frequent.
The fit of WHO data was obtained by means of the following function:
where k is the constant of the numerical fit and “log N” represents the natural logarithm of N. This function satisfies all the previous conditions, as shown below:
and
.
for N > e, where e ≅ 2.72 is the Euler’s number.
for N > e2 with e2 ≅ 7.39.
The number n of new relevant variants per ten million cases (Δ N = 107) turns out to be:
The relative variation |n′(N)| of new relevant variants per ten million cases decreases as the number N of cases increases: .
The choice of the function v = k · N / log N exploited in the fit can be justified through a heuristic argument discussed in Appendix A, where we also considered another function satisfying the three conditions listed in this section.
Results
We fitted the data recorded by WHO [1, 2] up to November 2021 by means of a specific code written with Wolfram Mathematica 12.1.3 [9].
Table 1 lists the characteristics of SARS-CoV-2 variants reported by WHO [1, 2]: date and country of the earliest detection, PANGO (Phylogenetic Assignment of Named Global Outbreak) and WHO classification, current relevance (variants of concern, variants of interest or under monitoring), total number of cases in the world at the end of the month of detection and cumulative number of variants. PANGO is a rule-based nomenclature system for naming and tracking SARS-CoV-2 genetic lineages [10]. The numerical fit of the WHO data was obtained by means of the function v(N) = k · N/log N, where the constant of the numerical fit is k = 2.55 · 10−6. The 95% confidence interval (CI) of k is given by 95% CI = (2.08 – 3.03) · 10−6. The adjusted R -squared, measuring the goodness of the fit, turned out to be R2 = 0.92. Further technical details on the fit of WHO data are presented in Appendix B.
Characteristics of SARS-CoV-2 variants recorded by WHO [1, 2]: date and country of the earliest detection, PANGO and WHO nomenclature, current relevance (concern, interest or under monitoring) and cumulative number of cases in the world at the end of the month of detection. The last column summarises the cumulative number of observed relevant variants.
Figure 1 represents the cumulative number v of relevant SARS-CoV-2 variants versus the cumulative number N of cases in the world. The dots from 1 to 12 correspond to the data reported by WHO [1, 2] from March 2020 to November 2021; the solid line represents the function v = k · N/log N used in the fit with Wolfram Mathematica.
Cumulative number of relevant SARS-CoV-2 variants versus the cumulative number of cases in the world. The dots from 1 to 12 indicate the data reported by WHO [1, 2] from March 2020 to November 2021; the solid line represents the function v = k · N/log N used in the numerical fit with Wolfram Mathematica.
The analytical formula v exploited in the fit allows to predict both the cumulative number of relevant variants and the new relevant variants for a given number of cases in the world.
The total cases in the world up to the 28th of November 2021 were 260493573 [2]. The corresponding cumulative number of relevant SARS-CoV-2 variants was 34.3, i.e. seven variants more than the last WHO update [1] in November 2021, when 27 relevant variants were recorded.
As discussed in the Methods section, the number n of new relevant variants per ten million (107) cases is , which becomes
by substituting the numerical value of k. From March 2020 to November 2021 the number of new variants per ten million cases decreased only by 28.6%, from 1.75 to 1.25.
Figure 2 reports the number n of new relevant SARS-CoV-2 variants per ten million cases versus the cumulative number N of cases in the world. The analytical behaviour represented by the plot is given by the formula .
Number n of new relevant SARS-CoV-2 variants per ten million cases versus the cumulative number of cases in the world. From March 2020 to November 2021 n decreased from 1.75 to 1.25.
The relative variation |n′(N)| of the number n of new relevant variants per ten million cases is given by , with |n′(N)|≪ 1 for N ≫1. This analytical result implies that the number n of new relevant variants per ten million cases decreases very slowly as the virus continues to circulate. For instance, n will remain above 1.10 as long as the cumulative cases in the world increase to about 4.2 billion.
Figure 3 represents the predicted increase of the cumulative number of relevant variants per each step of ten million infections, from 260 to 400 million cumulative cases in the world. In this range of cases, the number of relevant variants is predicted to increase from 34.3 to 51.6.
Prediction of the cumulative number of SARS-CoV-2 variants from 260 to 400 million cases in the world. The dotted line represents the function v = k · N/log N, while each step shown in the plot corresponds to the predicted number of new relevant variants per ten million cases.
As shown in Figure 3, the total infected cases were about 260 million at the end of November 2021.
Our model only focuses on the relationship between the number of virus replications and the emergence of relevant variants. All other factors involved in the diffusion of new variants were not taken into account. For this reason, we supposed that the parameter k in the fit v = k · N/logN is constant, although it actually varies with the other factors affecting the emergence of relevant variants.
Discussion
Since the start of the Covid-19 pandemic there has been an impressive global effort in investigating every aspect of the coronavirus epidemic [11], including immunogenetic [12, 13] and epidemiological [14] issues.
In this study, we built a simple mathematical model to calculate the number of relevant SARS-CoV-2 variants from the number of infected cases in the world. By fitting the WHO data listed in Table 1, we obtained an analytical formula which allows to predict both the cumulative number of variants and the number of new relevant variants in a given epidemiological situation. For example, up to the 28th of November 2021, the cumulative number of cases worldwide were 260493573 [2], corresponding to 34.3 relevant variants, i.e. seven variants more than the last WHO update in November 2021 [1]. Analogously, we found that the number of new relevant variants per ten million cases was 1.25 in November 2021, decreased only by 28.6% in comparison to March 2020 when the number was 1.75.
Our method depends critically on the WHO efficiency in tracking the most relevant SARS-CoV-2 variants. A different approach would be to consider the whole number of variants detected by genomic sequencing of SARS-CoV-2 and recorded in public databases such as GISAID [5]. This choice would be independent from the WHO targeting of the most relevant variants but would be less interesting from a clinical viewpoint since only the variants affecting virus transmission or disease severity are important to the control and management of the pandemic.
As shown in Figure 2, the number of new relevant variants per ten million cases decreases very slowly as the cumulative number of cases increases. Therefore, the persistence of virus circulation will always cause the emergence of new relevant SARS-CoV-2 variants.
Our model does not take into account the fact that the number of virus replications is different in each infected subject. However, the average number of replications can be assumed to be constant over a large number of infected cases, such as those recorded worldwide.
The diffusion of new relevant variants depends on a large variety of factors. For instance, the ability to monitor all virus mutations, the effectiveness of containment measures, vaccination campaigns, the evolution patterns of the virus [15], and so on. All these factors were ignored in our model, which only focuses on the number of virus replications. The parameter k appearing in the fit v = k · N/log N is not actually constant, as supposed in our model, but varies with all the other factors which can affect the emergence of relevant variants. None the less, our model provides a fairly good estimate of what we can expect in the future.
SARS-CoV-2 vaccination has led to a decrease in hospitalisations and disease severity. Nevertheless, the current number of infections is still too high to prevent the appearance of new variants potentially dangerous to public health.
The number of variants increases with the number of cases in the world: this result underlines the urgency of making every effort to reduce the impact of virus replication in all geographical areas. The risk that new relevant variants may emerge anywhere in the world indicates that the winning strategy is not to leave any country behind in the battle against the virus.
The possibility to predict the number of new relevant SARS-CoV-2 variants will become increasingly important in future to ensure optimal planning of vaccination campaigns by healthcare services, united in the awareness that new variants can change the characteristics of the virus and greatly influence the global management of the pandemic.
Data Availability
All data referred to in the manuscript are available online at https://www.who.int/en/activities/tracking-SARS-CoV-2-variants
Authors’ contributions
The authors contributed equally to the article.
Conflicts of Interest
The authors declare that no competing interests exist.
Funding
The authors received no specific funding for this work.
Ethical approval
Not applicable.
Acknowledgments
The authors are grateful to Anna Maria Koopmans for translations, professional writing assistance and preparation of the manuscript.
Footnotes
HTML version of the abstract updated.