Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

A data first approach to modelling Covid-19

Jayanti Prasad
doi: https://doi.org/10.1101/2020.05.22.20110171
Jayanti Prasad
1Khagol-20, 38/1, Panchavti, Pashan, Pune (India) - 411008
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: prasad.jayanti{at}gmail.com
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

The primary data for Covid-19 pandemic is in the form of time series for the number of confirmed, recovered and dead cases. This data is updated every day and is available for most countries from multiple sources such as [Gar20b, iD20]. In this work we present a two step procedure for model fitting to Covid-19 data. In the first step, time dependent transmission coefficients are constructed directly from the data and, in the second step, measures of those (minimum, maximum, mean, median etc.,) are used to set priors for fitting models to data. We call this approach a “data driven approach” or “data first approach”. This scheme is complementary to Bayesian approach and can be used with or without that for parameter estimation. We use the procedure to fit a set of SIR and SIRD models, with time dependent contact rate, to Covid-19 data for a set of most affected countries. We find that SIR and SIRD models with constant transmission coefficients cannot fit Covid-19 data for most countries (mainly because social distancing, lockdown etc., make those time dependent). We find that any time dependent contact rate decaying with time can help to fit SIR and SIRD models for most of the countries. We also present constraints on transmission coefficients and basic reproduction number Embedded Image, as well as effective reproduction number Embedded Image. The main contributions of our work are as follows. (1) presenting a two step procedure for model fitting to Covid-19 data (2) constraining transmission coefficients as well as Embedded Image and Embedded Image, for a set of countries and (3) releasing a python package PyCov19 [Pra20b] that can used to fit a class of compartmental models, with time varying coefficients, to Covid-19 data.

1 Introduction

At present the world is going through an unprecedented crisis of pandemic Covid-19 caused by a novel form of coronavirus, named Sars-CoV-2 which was passed to the human from bats in the Wuhan city of China, some time in December 2019 [Org20a, Org20b, ea20h, ea20t, ea20v, ea20d, ea20r, ea20l]. Till the end of May 2020 the virus has reached almost all the parts of the world resulting in more than six million people infected and more than a quarter million deaths [Wor20]. The measures to contain the virus medically by developing a vaccine are going on war footing. However, the success is still expected to be a few years away [ea20f]. Till a fraction of the population develop (herd) immunity or the vaccine is ready, the only means to contain the pandemic are social measures (social distancing, contact tracing etc.,) and enhanced hygiene practices [ea06, ea20s, ea20p].

Some of the most important problems related to Covid-19 research are (1) estimating the controlling parameters of the pandemic, (2) making short term predictions using mathematical-statistical modeling which can help in mitigating policies (3) simulating the growth of the epidemic by taking into account as many contributing effects as possible and (4) quantifying the impact of mitigation measures, such as lockdown etc [ea20j].

Modeling Covid-19 pandemic with compartmental models of Kermack and McKendrick (for an introduction see [JR08, Li18, BC18]) has been one of the most active problems in the recent times [ea20p, ea20a, ea20c, ea20e, FP20, ea20m, Oli20]. There have been alternative approaches also such as [IM20] where statistical considerations are being taken into account for predictions. In one of the studies [FP20] it is argued that the data for the confirmed, recovered and dead, all three can easily fit a power law model with similar coefficients. The main attractive feature of these data driven approaches is that the complexity of the model being considered is determined by the data and not by theoretical expectations.

In the present work we follow a middle approach and fit two compartmental models, named SIR and SIRD with some modification, to the Covid-19 data. One of the main reasons to consider these models has been that the Covid-19 data is available only for the Susceptible, Infected, Recovered and Dead compartments (for the notations used here and other places in the present work see table (1)). It may be true that a large fraction of the population which may be Exposed (defined later) play an important role in the dynamics of the pandemic however, it is hard to get reliable numbers for that. Apart from that, a large number of undocumented cases [ea20l] may also have significant influence on the spread of the pandemic.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 1:

Notations used in the work

A brief summary of the work presented here is as follows.

In §2 we give a brief introduction to the compartmental models and introduce the notations and variables used in the work. In particular, we discuss the SIR model in §2.1 and the SEIR and the SIRD models in §2.2 and §2.3 respectively. One of the major parts of the work presented here is to study the time dependence of the contact rate β, we introduce a set of parametric models of β(t) in §2.4. We discuss the time series data used in the study in the §3 by giving an example of Italy which is one of the most affected countries. The main results of our work are given in §4 and in §5. In §4 we discuss the reconstruction (regression) procedure for the set of transmission coefficients as well as for the effective reproduction (defined later), number Embedded Image. Parameter estimation is discussed in §5. The main conclusions of our work with a summary and some important points are discussed in §6.

2 Mathematical Modeling

Mathematical modeling of epidemics that started a century back with the seminal work of Kermack and McKendrick (see [NC08, JR08, Li18] for an introduction) is still the main framework most commonly used. The main idea of the Kermack and McKendrick’s compartmental models is that every individual in a society belongs to one of the M compartments and the total number of individuals belonging to different compartments keep changing with time. The minimum value M can have is two, for the Susceptible-Infected-Susceptible (SIS) model, in which the recovery does not guarantee that one will not get the infection again [JR08].

During an epidemic phase an individual can go through many stages from being perfectly healthy to the recovered one after an infection, with or without any immunity (short or long term) or may die. If we represent every stage with a compartment and keep the track of the number of individuals in each compartment then we can easily model the dynamics of the epidemic. This approach is very similar to the approach taken in astronomy where we count the number of stars in different stages of their life to understand the stellar evolution.

In principle we can have any number of logical compartments but in practice we should consider only those compartments for which we have the counts data, in particular for model fitting. Taking into account the fact that we have data only for the number of confirmed, recovered and dead population, the only compartmental model that meets the requirement is the SIRD model. If we consider the recovered and dead together we get the SIR model as is discussed in the next section.

One of the important compartments that also is commonly considered is the ‘exposed’ one and represents the population which have received the infection but cannot pass to others, before a certain period called the incubation period. If we consider exposed population also then we get the SEIR model that also is discussed below. Three compartmental models SIR, SIRD, and SEIR are shown in the (a), (b) and (c) panels of Figure (1) respectively (for more detail one can refer to [JR08, Het00, ea97, Oli20]).

Figure 1:
  • Download figure
  • Open in new tab
Figure 1:

In the compartmental models the total population N is divided into a set of compartments as shown in the figure. The number of compartments and their connectivity depends on the detail of the model being considered. If we consider these compartments as nodes of a graph then there are transmission coefficients for every connecting edge that determine how effective that edge is in changing the population of the connected compartments. In (a) and (c) representing SIR and SEIR models, the compartments are connected in a linear way, however, for the case (b), representing the SIRD model, there is a branching also. Since the total population must remain a constant so the rates of change along all the connecting edges must add to zero.

If we identify the compartments with the nodes of a graph then the transmission between different compartments, as is represented by a set of coefficients, can be considered the edges of the graph. Some of the nodes may have multiple edges and some of the edges could be bi-directional also. The main challenge of the modeling a pandemic like Covid-19 is not the scarcity of mathematical models but it is of the reliable data for the compartments being considered.

2.1 SIR Model

The most basic compartmental model is the SIR model which is shown in (a) of the Figure (1) and is described by the following set of equations:

Embedded Image

Here β and γ are the transmission coefficients, also called the contact rate and the recovery removal rate respectively, and 1/β and 1/γ represent the mean duration of infectiousness and the average period of infectivity (see [Het00, ea03a]), respectively, (see Table (1 for notations).

In general there is some time lag between acquiring an infection and becoming infectious. However, in the SIR model it is ignored and an assumption is made that individuals become infectious immediately upon getting an infection. This is a very strong assumption and the main reasons for making this is that we do not know reliably how many people are actually ‘exposed’, or have the virus but are still not infectious (cannot pass it to others). One of the ways to address this problem could be contact tracing and assuming that anyone who has come into contact with an infected person is an exposed one. However, this assumption is as strong as the assumption made in the SIR model.

If we do not consider the birth, death and movement of people then the following condition must be satisfied. Embedded Image here S, I and R is the population of the S, I and R compartments respectively.

In equation (1) the transmission coefficient β is one of the most important parameters of the epidemic dynamics and can be written as the product of the contact rates (the average number of contacts per person per time) and the transmission probability (the probability of disease transmission on contact between a susceptible and an infectious person). As has been mentioned that the transmission coefficient γ can be identified with the recovery rate which is nothing but the inverse of the infectious period (during which an infected person can pass the virus to other healthy people).

In general, the equations (1) is solved with the following initial conditions:

Embedded Image

The second equation from 1 can be written as: Embedded Image and for S/N > γ/β we get a positive infection rate.

Here we define one of the most important parameters of an epidemic in terms of the ratio β/γ, called the basic reproduction number Embedded Image, when considered a constant, and called the effective reproduction number Embedded Image, when considered a function of time. The most common definition [JR08] of it is that it is the average number of secondary cases arising from an average primary case in an entirely susceptible population. Note that in the text we may also use just “reproduction number” and the meaning of it will depend on the context. Some of the studies such as [ea20a] call Embedded Image and Embedded Image both as basic reproduction number, however, we follow the convention used in [ea03b, ea03a, Cob20].

The basis reproduction number Embedded Image is the main measure which quantifies the transmissibility of the virus and Embedded Image, sets a chain of transmissions leading an exponential growth of the pandemic. We can keep Embedded Image, by minimizing the contact rates (social distancing etc.,), lowering the infectiousness of the infected people (by treating them or putting them in a quarantine etc.) and reducing the susceptibility of the healthy people by vaccination etc., (for detail see [ea05]).

The SIR model is one of the most basic models and can be easily generalized by one or more of the following ways:

  1. Adding more compartments: Depending on the type of pandemic and other details we are interested in we can add more compartments to the SIR model. These compartments can fit in between the existing ones (for example as shown in Figure (1) (c) for SEIR case) or can branch out from the existing once (as shown in Figure (1) (b) for the SIRD case). With every new compartment added we must include the transmission coefficients for the connecting edges and also need an initial population for the new compartment being added [BC18, BS20, ea20b, ea20u].

  2. Heterogeneous population: In the basic SIR model we consider a homogeneous population which share transmission coefficients such as β and γ. This may not be true in practice, for example, people from different groups (based on age, medical conditions, gender etc, etc) may have different contact rate β and/or recovery rate γ [ea20p, RR20].

  3. Variable transmission coefficients: Theoretically transmission coefficients such as β and γ are considered constant, however, in practice they can vary with time due to multiple factors as the pandemic spreads [Cob20, P.20, GPD20]. For example, social distancing and other precautions such as hand wash etc., may help to lower β. In a similar way a better understanding of the disease and the urgency with which medical and testing facilities are brought online may improve the recovery rate γ.

Figure 2:
  • Download figure
  • Open in new tab
Figure 2:

The left panel in the figure shows the standard SIR model with three type of population (compartments) - Susceptible, Infected and Recovered and the right panel shows the SIRD model with four type of population - on extra compartment for the dead population. The dynamics of the epidemic depends on the transmission coefficients β, γ and δ and the starting values of the different types of population. The values of the parameters and the initial population in different compartments used in the figure are only for the purpose on an illustration.

2.2 SEIR model

If we relax the assumption that the people who get the infection become infectious instantly and consider a latent period to the the onset of infectiousness there is a fraction of population (compartment) which has been exposed to the virus but will become infectious only after some latent period 1/σ, then the model is called Susceptible-Exposed-Infected-Recovered (SEIR) model represented by the following set of equations [JR08]

Embedded Image

Note that if we combine the second and third equation above we get:

Embedded Image

From the above equation we can see that population in the E and I compartments together can grow with time only when the fraction of the susceptible population is greater than the inverse of the reproduction number:

Embedded Image

There are many forms of SEIR equations which are in common use (see [BC18, ea20p, ea20u, ea20b, P.20, ea20n]) however, equation (5) is the simplest one and does not include natural deaths. One of the common practices with the SEIR model has been to consider the incubation period 1/σ a constant, and estimates it from some other observations. The SEIR modal is quite complex as compared to the SIR model and we cannot find the number of exposed people exactly at time t = 0 for evolving the equations and so the approach used to define R no longer works. Thanks to the new generation matrix models [BC18] it is still possible to write R in a close form for this case also.

2.3 SIRD model

One of the serious drawbacks of the SIR model is that people who recover and who die are treated in the same way - there are no separate compartments for the dead and recovered people. This drawback can be addressed by separating the compartments for the dead and recovered population as is done in the SIRD model described with the following set of equations (for a detail discussion see [ea20a, Vil20]).

Embedded Image

Here a new transmission coefficient δ has been introduced which we can identify with the death rate. One of the advantages of the SIRD model is that it has three transmission coefficients β, γ and δ and we have the data for three time series I(t), R(t) and D(t) available so it is possible to compute the time dependency of all the three coefficients as well as the reproduction number Embedded Image.

The aim of any mitigation measures may be one or more of the followings:

  1. Lower the contact or infection rate β.

  2. Lower the mortality rate δ.

  3. Increase the recovery rate γ.

The SIRD model provides us a framework to estimate or fit all these parameters. In one of the coming sections we will discuss how we can reconstruct the transmission coefficients β, δ and γ as well as Embedded Image from the data by a direct reconstruction approach.

The basic reproduction rate for the SIRD model can be written in the following way [ea20a]: Embedded Image or, Embedded Image where Embedded Image and Embedded Image. If apart from death and recovery there is some other channel that can lower the population in the I compartment, for example if infected people move out from that region with transmission coefficient η then we can write: Embedded Image with Embedded Image. A more realistic model will have multiple compartments (nodes), either connected in series or some branching out from others, with data to constrain the transmission coefficients (edges). Apart from this, realistic models may also require to consider different transmission coefficients of different subgroups (based on age etc.,). Incorporating, all these considerations will lead to very complex models having very less connect with the actual data we have.

2.4 Time dependent β models

As a pandemic triggers various containment measures [ea20s, DG20, ea20k, SNC20, RR20, ea20c] such as lockdown, social distancing, improved hygiene practices etc., are taken and that lead to transmission coefficients such as β becoming time dependent [GPD20, ea20m, FP20, ea20e, ea20i]. Apart from this, the drop in the susceptible population also decreases β (see [ea03a, Cob20]).

Lockdown has been one of the most common mitigation measures followed all over the world and, in its extreme form, we can assume that once it starts the contact rate between susceptible and infectious people drops to zero. In general, the lockdown starts on a fixed day tl and has a duration (time scale) we call τ (we will be using both τ and corresponding decay rate µ = 1/τ in the discussion). We can incorporate these two parameters into the modeling of β(t) in many different ways and a set of three common choices is given below:

  1. Polynomial Suppression [ea20m]: Embedded Image This model is discussed here just for an example and we do not expect the variation of β(t) as slow as linear one. This expression shows that β(t) starts with an initial value β0 and after time tl it starts decreasing linearly with a constant rate of µ = 1/τ and finally becomes β0(1 − µ) at t = ∞.

  2. tanh Suppression: Embedded Image

    This form of suppression of β(t) starts with a constant value β0 at some t = tl and keeps decaying for period represented by τ and finally settles to a final value β0(1 −α) as is shown in Figure (3). This can be written in the following way also: Embedded Image

    From equation (13) we can also write: Embedded Image

    Equations (14) and equation (15) are important to find the priors for α and µ once we know the priors for β and this will be discussed again in §4 and will be used in parameter estimation in §5.

  3. Exponential Suppression [FP20, ea20o, ea20n]: Embedded Image

    This model is similar to the tanh model and in this case also β(t) starts from some initial value β0 and after decreasing for a period and finally approaches to a constant value β0α at t = ∞ as is shown in Figure (4). Note that the transmission coefficient β may decay with time without any intervention also as is discussed in [ea97] for plants.

    In this case also we can write: Embedded Image and Embedded Image

    These equations also will be used to find the priors for parameter estimation.

    In one of the studies [ea20o] it has been argued that even the time of recovery 1/γ may also vary with time due to the improvement in medical understanding of the epidemic and facilities and that also can be modeled as an exponential function. There have been other physically well motivated exponentially decaying forms also such as given in [Vil20] in which β starts from starting value β0 and decay with rate 1/τ finally becomes β1. Embedded Image

    The author argues that β1 depends on the policy decisions leading to behavioral changes. This model is different from the model we are considering only in the respect that it considers the “lockdown” from the beginning i.e., t = 0.

Figure 3:
  • Download figure
  • Open in new tab
Figure 3:

This figure shows the effect of α and µ on β(t) for the tanh model.

Figure 4:
  • Download figure
  • Open in new tab
Figure 4:

This figure shows the effect of α and µ on β(t) for the exp model.

The time dependent β models as are discussed above and shown in Figure (5) share a common property that before a certain time tl, that we can identify with the day on which lockdown starts, β has a constant value β0 and after that it starts decreasing with a rate that depend on the parameter µ = 1/τ. The effect of the suppression in β is controlled by the parameter µ and for its zeros values all the models become constant β models. From Figure (5) we can conclude that different models can lead to the same amounts of “flattening” of the curve with a different choice of parameters so there is no preferred model for the suppression.

Figure 5:
  • Download figure
  • Open in new tab
Figure 5:

The top panel in the figure shows the β suppression models as are discussed in the text. The solid and dashed line in the bottom panel show the number of infected people I(t) and the number of recovered people R(t) for the SIR model corresponding to different β models. From the bottom panel we can see that all the models have similar effects, however, for the tanh case the suppression in the peak of I(t) is maximum. This figure is for an illustration the values of parameters have been chosen carefully to highlight the effects.

The SIR model with constant transmission coefficients is applicable only in the situation when the pandemic is let to grow without any intervention. In the real world once a pandemic starts interventions of different kinds (social, medical etc.,) are considered to reduce the rate at which the the epidemic spreads. These interventions can be easily taken into account by considering a time dependent (decaying) growth rate (β). As we can see from the above figure that a decaying (exponentially) β helps to contain the disease by lowering the height of the peak as shown in Figure (5).

3 Data

The primary data for Covid-19 is in terms of three times series for the count of confirmed Embedded Image, recovered Embedded Image and dead Embedded Image, persons for every country. By definition all the three times series are non-decreasing functions of time, as are shown in Figure (6) for Italy. The data for Covid-19 is provided by the John Hopkins University [ea20g, Gar20a, Gar20b] and is updated on a daily basis and interactive tools are also provided for data exploration. The worldometer website [Wor20] and our world in data [iD20] also provide an up to-date data for most countries with some extra information, such as the numbers for the active cases, critical cases and the test conducted for a million population of the country.

Figure 6:
  • Download figure
  • Open in new tab
Figure 6:

The total number of confirmed, recovered and dead case for Italy.

The time series which are shown in Figure (6) grow exponentially in the beginning and then settle to a slower power law growth. If we look at the daily new cases, as shown in Figure (7), they look like broad peaked functions with increasing fluctuations around the the current value. Time series for a set of countries used in the analysis are shown in the Figure (8).

Figure 7:
  • Download figure
  • Open in new tab
Figure 7:

The number of new confirmed, recovered and dead case for Italy.

Figure 8:
  • Download figure
  • Open in new tab
Figure 8:

Time series for a set of countries we used for modeling.

There are many factors, known and unknown, which determine the behavior of these time series, such as the nature of the diseases/virus, the health profile of the population, availability of medical infrastructure, age-distribution, social mixing, personal hygiene and may be geographical location and genetic makeup of the population etc. Compartmental models, such as SIR, SIRD, SEIR etc., help to simplify the problem by replacing a large number of controlling parameters with a very small set of well motivated parameters - the transmission coefficients.

The compartmental models predict that how the population in different compartments change with time. In the SIR model the population in S and R compartments can only decrease and increase respectively, however, in the I compartment it can increases as well as decreases. On the onset of the epidemic almost everyone is in the S compartment with a very small fraction in I and no one in the R compartment. At the end of the pandemic everyone is in the R compartment with no one in S or I compartment (see Figure (1). The same happens for SIRD model also where the decrease in I compartment happens due to recovery (R) and deaths (D).

The time series I(t) for the population in compartment I can be obtained by subtracting Embedded Image and Embedded Image from Embedded Image.

Embedded Image

For a set of countries the time series of I(t) are shown in Figure (8). The time series Embedded Image and Embedded Image have very weak dependency on each other - the only constrains they have to satisfy is that the sum total of the population in different compartment must add to the total population. The number of people I(t) in the I compartment at time t does depend on all the three transmission rates β, γ, and δ, therefore it is a good measure which we can fit to a compartmental model, such as SIR or SIRD and can get constraints.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 2:

A summary for the countries used for modelling

4 Reconstruction

In this and next section we present the main results of the study in the form of demonstrating a reconstruction procedure for the time dependent transmission coefficients β(t), γ(t), δ(t) and the effective reproduction number Embedded Image. We consider an example of Italy for this procedure. Note that this approach is common and can be used to understand the variation of the transmission coefficients with time as a result of interventions. The main advantages of this approach is that there are no parameters to adjust and so the results are easy to reproduce.

The approach we use here is similar to as used in [ea20e, GPD20]. In this approach the evolution equations are written in a discretized form as shown in equation (21).

Embedded Image

From the third equation we can write: Embedded Image and using this and second equation from (21) we get,

Embedded Image

Note that by definition Rt+1 ≥ Rt, so γ(t) ≥ 0, however, we may have It+1 ≤ It also, β(t) may become negative also once the population in the compartment I starts decreasing.

Here an important assumption is being made and that is the fraction of susceptible population S/N is close to unity which may be true at the beginning of the epidemic. Once we have expressions for the time dependent β and γ we can also written an expression for the time dependent reproduction number in the following way:

Embedded Image

Following the similar procedure we can write the SIRD equations in the following discretized form:

Embedded Image

From these questions we can write: Embedded Image and can write the expression for the reproduction number: Embedded Image where ∆Xt = Xt+1 − Xt with X = I, R and D. This equation is identical to equation (24) if we do not count dead and recovered separately i.e., replace ∆Rt + ∆Dt with ∆Rt. One of the interpretations of R is that it is a ratio of two rates and so in case we are interested finding out two separates measures for γ and δ, we can also write: Embedded Image and so,

Embedded Image

The procedure as discussed above can be used to know the variation of the transmission coefficients β, γ, δ and effective reproduction Embedded Image with time. In order to follow this procedure we need to abandon first few data points which have very high noise. As explained above occasionally we may also have negative values of R(t).

In figure (9) we show the reconstruction for β(t), γ(t) and δ(t) for Italy with SIRD. From this figure it is clear that all the three transmission coefficients vary with time and the variation of β(t) is maximum. The same trend is observed for other countries also. In the next section will fit a parameterized form of β(t) which we obtain from the model-fitting to the data shown in the figure.

Figure 9:
  • Download figure
  • Open in new tab
Figure 9:

Reconstruction of the transmission coefficients β(t), γ(t) and δ(t) with the SIRD model for Italy.

Figure 10:
  • Download figure
  • Open in new tab
Figure 10:

Histogram of the effective reproduction number Embedded Image with different models.

Once we have reconstructed β(t), γ(t) and δ(t) we can easily get Embedded Image for SIR and SIRD model.

4.1 Priors

One of the import uses of the reconstruction procedure we have discussed here is to find the priors (minimum, maximum and best fit) values for the parameters to be fitted. Once we have the estimates for β(t), γ(t) and δ(t) from the above procedure we can easily find Xmin, Xmax, X0, values (with X = β, γ, δ. Here, X0 is the approximate point for the parameter that is needed in many optimization procedure which iteratively find the solution. Since in the present work we use parametric form of β(t), so we need priors for the parameters of β(t) i.e., β1, α, µ and τ which can find from the reconstructed β(t) (see §2.4 for detail).

5 Model fitting and parameter estimation

We consider a set of six compartmental models, three belonging to the SIR and three to the SIRD class. The models are different from each other in terms of the choice for the epidemiological class (SIR or SIRD) or the model for the contact rate β(t) (see §2.4 for detail). A summary of the models is given in Table (3).

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 3:

A summary of the models being considered for the analysis

Note that in the model (5) and (6), β(t) starts decaying from the very beginning (in place of starting from a particular day representing the date of the lockdown) with a constant rate µ.

In any fitting procedure the choice of the loss function depends on what we wish to fit. In the common least square fitting we use the sum of the squares of the offsets as the loss function. However, there is a problem here with the data we have for that choice. The time series we wish to fit have small values at the beginning and very large values at the later stage, so the the fitting is biased towards the points which have large values. One of the solutions for this could be to fit the log of the time series but then the fitting becomes biased toward small values, in the beginning (or later stage when the peaks falls).

We decide to use the loss function of the ordinary least square which fits the data points close to the peak (having higher values) more accurately than other data points. We found this useful for the following two reasons:

  1. The peak in the time series is an important feature, in particular its location and height, therefore any loss function biased towards it is justified.

  2. For the short term predictions only the data points close to the dates of prediction is important, so using a loss function that fits later points (having higher values) more accurately than the noisy data points in the beginning is favorable.

The loss function which we used for fitting the data to SIR and SIR models is given below.

Embedded Image

The variables used in the above equations are defined in table (4):

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 4:

A summary of the variables

Once we have estimated the fitting parameters of our model we can find a smooth representation of β(t) using the best-fit values of the parameters β0, α, µ and tl as is shown in Figure 11 for Italy with SIR model and exp model for β(t). The first few data points are very noise and so are not shown in the figure. The similar exercise is also done for Embedded Image in Figure (12). One of the shortcomings for the case of Embedded Image is that we have assumed that γ as a constant although the data does show the variation of γ and δ with time (see Figure (9). We have also shown the beta model fitting for India and US in the Figure (13) and (14).

Figure 11:
  • Download figure
  • Open in new tab
Figure 11:

In this figure we have shown the reconstructed β(t) and a smoother version which we have obtained by model fitting for the SIR case with exp model for β(t). Similar exercise can be done for other countries also. Since the data points are very noise so it does not make much sense to use more complex model for β(t) that what has been used here.

Figure 12:
  • Download figure
  • Open in new tab
Figure 12:

Reconstructucted effective reproduction number Embedded Imagefor the same set of parameters as are used in Figure (11)

Figure 13:
  • Download figure
  • Open in new tab
Figure 13:

The same as in Figure (11) for India.

Figure 14:
  • Download figure
  • Open in new tab
Figure 14:

The same as in Figure (11) for US.

For SIR and SIRD models we fit multiple time series together so we must weight the sum of the squares of the offsets for different time series since they have very different values - the value of I(t) is generally few orders of magnitude higher than R(t) and D(t). We use the following weights for this purpose: Embedded Image where Embedded Image is the average of the time series X(t).

We use the solve_ivp and minimize modules from the Scipy [Sci20] for integrating the differential equations and minimize the cost function respectively. The loss function given by equation (30) represents the Root Mean Square Deviation (RMSD) and we use its final value as a measure of the goodness of the fit.

A list of fitting parameter for the different models is given in Table (3). For the SIR class of models, model (1) and (2) we have five fitting parameters named, γ, β0, α, µ, tl and for the SIRD models, model (3) and (4) we have six fitting parameters named γ, β0, α, µ, tl and δ. As we can notice that for model (1) to model (4) four of the parameters are associated with β(t) and for model (5) and (6) the variation of β(t) is controlled by just two parameter - β0, the initial value of β and its decay rate µ = 1/τ.

The best fit values of the fitting parameters with their 90 % CI (standard deviation) as well as the median values are given Table (5). The tables also give the estimate for the effective reproduction number Embedded Image which is a derived quantity here. Note that for computing Embedded Image we have extrapolated the value of Embedded Image to the last date for which the data is being used here.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 5:

A summary of the fitting parameters for all the models.

A histogram of the effective reproduction number for the different models being considered is shown in Figure (10) and detail values of that for different countries, which include the average values as well as 90% CI (StdDev), are given in Table (6).

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 6:

Effective reproduction number Embedded Image on, May 31, 2020, for countries with different models

6 Summary and conclusions

Covid-19 is a global crisis and understanding its impact on different systems of the modern human life (medical, social, economical etc.,) and the responses presented is an important exercise to carry out. We understand that despite being a global phenomenon, the impact of Covid-19 in terms of the loss to life and the resourced being exhausted depend on the local conditions as well as on the mitigation measures taken locally. However, we believe that the global picture of the crisis does help to plan and take policy decisions at the local scale also.

Full understanding of any pandemic, in particular like Covid-19 which does not have any other examples in the history (in terms of the scale and impact), may become available only when it is over and the facts and figure presented here may have very short life. However, we still believe that any quick timely insight may help a lot in terms of the planning for the worse. Knowing very well that all the mathematical models are wrong but some are useful, we believe that mathematical models which are presented in this work may help to develop some insight about the crisis. A brief summary of the work presented here is as follows.

In §1 we have given a very brief introduction of the problem being addressed and reviewed some of the key works about Covid-19 which motivated the present work. A brief introduction of the mathematical framework used in the work in §2, in particular we have review a set of compartmental models SIR, SEIR and SIRD in §2. We have also discussed a set of of parametric models for one of the transmission coefficients β(t), in §2.4. We have discussed the data being used in the work in §3.

The main results of the present work are discussed in §4 and §5. In §4 we have reviewed a reconstruction procedure for the transmission coefficients and basic reproduction number Embedded Image. This procedure does not depend on the choice of any parameter and can be easily generalized for other similar models also. We have presented the best-fit values of the parameters with their 90% CI, §5 in the form a set of tables. We have presented the values of the parameters in the following two forms:

  1. Model based

  2. Country based

All the fitting parameters for the models being considered are summarized in Table (5) and full lists of parameters for different countries with different models is given at [Pra20a]. Here we only give the values of the effective reproduction for the countries (see Table (2) on the last date for which we have the data in Table (6).

The work we presented here assumes that spreading of a pandemic like Covid-19 happens homogeneously in space and time, however, we know that it is far from true. As the experience [ea03b] shows that “super-spread” events (SSEs) or rare events where, one particular infectious person interacts with a very large number of susceptible people over a short period of time have the maximum impact. In these situation the average measures like Embedded Image are not very informative. In the present work we data for a set of countries to constrain the parameters of the SIR and SIRD model one similar exercise with SIRD model for India is done in [ea20q].

In the present work we have considered only time variation of β, however, from the data we can see that other transmission coefficients such as γ and δ also change with time, although not that much, mainly because they depend more on the nature of the disease and less on the mitigation and other social measures. Significant change in the values of γ and δ can take place only due to medical interventions.

Figure 15:
  • Download figure
  • Open in new tab
Figure 15:

Fitting models with data for Italy. The panel from the top left to the bottom right are for the Models (1) to Model (6) respectively.

Data Availability

All the data as well the code used in the work is available from the GitHub page of the author.

https://github.com/jayanti-prasad/PyCov19

Acknowledgment

The author would like to thank Dr. Gaurav Goswami for comments and feedback. At present the author works as an independent researcher and data scientist and the work presented here is not supported by any public or private agency. The author will be thankful to any agency, individual or individuals who come forward to sponsor/support this and other similar works on Covid-19.

References

  1. ↵
    Julie Blackwood and Lauren M. Childs. An introduction to compartmental modeling for the budding infectious disease modeler. Letters in Biomathematics, 5:1:195–221, 2018. doi:10.1080/23737867.2018.1509026.
    OpenUrlCrossRef
  2. ↵
    Kathakali Biswas and Parongama Sen. Space-time dependence of corona virus (covid-19) outbreak. ArXiv e-prints, 2020, 2003.03149.
  3. ↵
    Sarah Cobey. Modeling infectious disease dynamics. Science, 2020. doi:10.1126/science.abb5659 (2020).
    OpenUrlAbstract/FREE Full Text
  4. ↵
    Emanuele Daddi and Mauro Giavalisco. Early forecasts of the evolution of the covid-19 outbreaks and quantitative assessment of the effectiveness of countering measures. ArXiv e-prints, 2020, 2004.08365.
  5. ↵
    C. A. Gilligan et al. Analysis and fitting of an sir model with host response to infection load for a plant disease. Philos Trans R Soc Lond B Biol Sci., 352(1351):353–364, 1997. doi:10.1098/rstb.1997.0026.
    OpenUrlCrossRefWeb of Science
  6. ↵
    Lipsitch et al. Transmission dynamics and control of severe acute respiratory syndrome. Science, 300(5627):1966–1970, 2003. doi:10.1126/science.1086616.
    OpenUrlAbstract/FREE Full Text
  7. ↵
    Riley et al. Transmission dynamics of the etiological agent of sars in hong kong: Impact of public health interventions. Science, 300:1961–1966, 2003. doi:10.1126/science.1086478.
    OpenUrlAbstract/FREE Full Text
  8. ↵
    Neil M. Ferguson et al. Strategies for containing an emerging influenza pandemic in southeast asia. Nature, 437:209–214, 2005. doi:10.1038/nature04017.
    OpenUrlCrossRefPubMedWeb of Science
  9. ↵
    Neil M. Ferguson et al. Strategies for mitigating an influenza pandemic. Nature, 442:448–452, 2006. doi:10.1038/nature04795.
    OpenUrlCrossRefPubMedWeb of Science
  10. ↵
    Anastassopoulou et al. Data-based analysis, modelling and forecasting of the covid-19 out-break. PLOS ONE, 2020. doi:10.1371/journal.pone.0230405.
    OpenUrlCrossRefPubMed
  11. ↵
    Arcede et al. Accounting for symptomatic and asymptomatic in a seir-type model of covid-19. ArXiv e-prints, 2020, 2004.01805.
  12. ↵
    Castilho et al. Assessing the efficiency of different control strategies for the coronavirus (covid-19) epidemic. ArXiv e-prints, 2020, 2004.03539.
  13. ↵
    Chen et al. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in wuhan, china: a descriptive study. The Lancet, 395:507–513, 2020. doi:10.1016/S0140-6736(20)30211-7.
    OpenUrlCrossRefPubMed
  14. ↵
    Chen et al. A time-dependent sir model for covid-19 with undetectable infected persons. ArXiv e-prints, 2020, 2003.00122.
  15. ↵
    Corey et al. A strategic approach to covid-19 vaccine r&d. Science, 2020. doi:10.1126/science.abc5312.
    OpenUrlAbstract/FREE Full Text
  16. ↵
    Dong et al. An interactive web-based dashboard to track covid-19 in real time. The Lancet, 20:533–534, 2020. doi:10.1016/S1473-3099(20)30120-1.
    OpenUrlCrossRefPubMed
  17. ↵
    Gorbalenya et al. The species severe acute respiratory syndrome-related coronavirus: classifying 2019-ncov and naming it sars-cov-2. Nat Microbiology, 5:536–544, 2020. doi:10.1.1/jpb001.
    OpenUrlCrossRef
  18. ↵
    Hotz et al. Monitoring the spread of covid-19 by estimating reproduction numbers over time. ArXiv e-prints, 2020, 2004.08557.
  19. ↵
    Jonas Dehning et al. Inferring change points in the spread of covid-19 reveals the effectiveness of interventions. Science, 2020. doi:10.1126/science.abb9789.
    OpenUrlAbstract/FREE Full Text
  20. ↵
    Kucharski et al. Early dynamics of transmission and control of covid-19: a mathematical modelling study. The Lancet, 20(5):553–558, 2020. doi:10.1016/S1473-3099(20)30144-4.
    OpenUrlCrossRefPubMed
  21. ↵
    Li et al. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov-2). Science, 368(6490):489–493, 2020. doi:10.1126/science.abb3221.
    OpenUrlAbstract/FREE Full Text
  22. ↵
    Loli et al. Preliminary analysis of covid-19 spread in italy with an adaptive seird model. ArXiv e-prints, 2020, 2003.09909.
  23. ↵
    Lopez et al. A modified seir model to predict the covid-19 outbreak in spain and italy: simulating control scenarios and multi-scale epidemics. MedRxiv e-prints, 2020, 2003.09909. doi:10.1101/2020.03.27.20045005.
    OpenUrlAbstract/FREE Full Text
  24. ↵
    Pone et al. A data driven analysis and forecast of an seiard epidemic model for covid-19 in mexico. ArXiv e-prints, 2020, 2004.08288.
  25. ↵
    Prem et al. The effect of control strategies to reduce social mixing on outcomes of the covid-19 epidemic in wuhan, china: a modelling study. The Lancet, 5:261–270, 2020. doi:10.1016/S2468-2667(20)30073-6.
    OpenUrlCrossRefPubMed
  26. ↵
    Saptarshi Chatterjee et al. Studying the progress of covid-19 outbreak in india using sird model. medRxiv, 2020. doi:10.1101/2020.05.
    OpenUrlCrossRef
  27. ↵
    Sun et al. Covid-19: Epidemiology, evolution, and cross-disciplinary perspectives. Trends in Molecular Medicine, 26(5):483–495, 2020. doi:10.1016/j.molmed.2020.02.008.
    OpenUrlCrossRef
  28. ↵
    Walker et al. The global impact of covid-19 and strategies for mitigation and suppression. Imperial College COVID-19 Response Team, 2020. doi:10.25561/77735.
    OpenUrlCrossRef
  29. ↵
    Wu et al. A new coronavirus associated with human respiratory disease in china. Nature, 579:270–273, 2020. doi:10.1038/s41586-020-2012-7.
    OpenUrlCrossRefPubMed
  30. ↵
    Yang et al. Modified seir and ai prediction of the epidemics trend of covid-19 in china under public health interventions. Journal of thoracic disease, 12(3):165–174, 2020. doi:10.21037/jtd.2020.02.64.
    OpenUrlCrossRef
  31. ↵
    Zhou et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature, 579:270–273, 2020. doi:10.1038/s41586-020-2012-7.
    OpenUrlCrossRefPubMed
  32. ↵
    Duccio Fanelli and Francesco Piazza. Analysis and forecast of covid-19 spreading in china, italy and france. Chaos, Solitons and Fractals, 134:109761, 2020, 2003.06031. doi:10.1016/j.chaos.2020.109761.
    OpenUrlCrossRef
  33. ↵
    Lauren Gardner. Mapping 2019-ncov. https://systems.jhu.edu/research/public-health/ncov/, 2020.
  34. ↵
    Lauren Gardner. Novel coronavirus (covid-19) cases. https://github.com/CSSEGISandData/COVID-19, 2020.
  35. ↵
    Gaurav Goswami, Jayanti Prasad, and Mansi Dhuria. Extracting the effective contact rate of covid-19 pandemic. ArXiv e-prints, 2020, 2004.07750.
  36. ↵
    Herbert W. Hethcote. The mathematics of infectious diseases. SIAM Review, 42(4):599–653, 2000.
    OpenUrlCrossRefPubMed
  37. ↵
    Our World in Data. Data on covid-19 (coronavirus) confirmed cases, deaths, and tests. https://github.com/owid/covid-19-data, 2020.
  38. ↵
    Robert Marsland , III and Pankaj Mehta. Data-driven modeling reveals a universal dynamic underlying the covid-19 pandemic under social distancing. ArXiv e-prints, 2020, 2004.10666.
  39. ↵
    Keeling Matt J., and Pejman Rohani. Modeling Infectious Diseases in Humans and Animals. Princeton University Press, 2008.
  40. ↵
    Michael Y Li. An Introduction to Mathematical Modeling of Infectious Diseases. Springer International Publishing, 2018.
  41. ↵
    Grassly NC and Fraser C. Mathematical models of infectious disease transmission. Nat Rev Microbiol., 6:477–87, 2008. doi:10.1038/nrmicro1845.
    OpenUrlCrossRefPubMedWeb of Science
  42. ↵
    Gonçalo Oliveira. Refined compartmental models, asymptomatic carriers and covid-19. ArXiv e-prints, 2020, 2004.14780. doi:10.1101/2020.04.14.20065128.
    OpenUrlAbstract/FREE Full Text
  43. ↵
    World Health Organization. Novel coronavirus (2019-ncov) situation report - 1. https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports, 2020.
  44. ↵
    World Health Organization. Report of the who-china joint mission on coronavirus disease 2019 (covid-19). https://www.who.int/publications-detail/report-of-the-who-china-joint-mission-on-coronavirus-disease-2019-(covid-19), 2020.
  45. ↵
    Tales P. A time-dependent seir model to analyse the evolution of the sars-covid-2 epidemic outbreak in portugal. Bull World Health Organ, 2020. doi:10.2471/BLT.20.258392.
    OpenUrlCrossRef
  46. ↵
    Jayanti Prasad. A data first approach to covid-19. https://github.com/jayanti-prasad/Covid-19/tree/master/project2/, 2020.
  47. ↵
    Jayanti Prasad. A python package for fitting covid-19 data. https://github.com/jayanti-prasad/PyCov19, 2020.
  48. ↵
    Singh R. and Adhikari R. Age-structured impact of social distancing on the covid-19 epidemic in india. ArXiv e-prints, 2020, 2003.12055.
  49. ↵
    SciPy. Scientific computing tools for python. https://www.scipy.org/, 2020.
  50. ↵
    Tridip Sardar, Sk Shahid Nadim, and Joydev Chattopadhyay. Assessment of 21 days lockdown effect in some states and overall india: A predictive mathematical study on covid-19 outbreak. ArXiv e-prints, 2020, 2004.03487.
  51. ↵
    Villaverde. Estimating and simulating a sird model of covid-19 for many countries, states, and cities. https://cepr.org/active/publications/discussion_papers/dp.php?dpno=14711, 2020.
  52. ↵
    Worldometers. Covid-19 coronavirus pandemic. https://www.worldometers.info/coronavirus/, 2020.
Back to top
PreviousNext
Posted June 02, 2020.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
A data first approach to modelling Covid-19
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
A data first approach to modelling Covid-19
Jayanti Prasad
medRxiv 2020.05.22.20110171; doi: https://doi.org/10.1101/2020.05.22.20110171
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
A data first approach to modelling Covid-19
Jayanti Prasad
medRxiv 2020.05.22.20110171; doi: https://doi.org/10.1101/2020.05.22.20110171

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Epidemiology
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)