Abstract
The time-series Susceptible-Infectious-Recovered (TSIR) model has been a standard tool for studying the non-linear dynamics of acute, immunizing infectious diseases. The standard assumption of the TSIR model, that vaccination is equivalent to a reduction in the recruitment of susceptible individuals, or the birth rate, can lead to a bias in the estimate of the reporting fraction and of the total incidence. We show that this bias increases with the level of vaccination due to a double-counting of individuals who are infected prior to the age of vaccination. We present a simple correction for this bias by discounting the observed number of cases by the product of the number that occur prior to the average age of vaccination and the vaccination coverage during the initial susceptible reconstruction step of the TSIR model fitting. We generate a time series of measles cases using an age-structured SIR transmission model with vaccination after birth (at 9 months of age) and illustrate the bias with the standard TSIR fitting method. We then illustrate that our proposed correction eliminates the bias in the estimated reporting fraction and total incidence. We note further that this bias does not impact the estimates of the seasonality of transmission.
1 Introduction
The incidence of acute, immunizing infectious diseases displays complex non-linear dynamics that have been remarkably well described by simple dynamical models [1, 2, 3]. The time-series Susceptible-Infectious-Recovered (TSIR) model [2] has been an essential tool in bridging non-linear epidemic models and empirical data on disease incidence, particularly of immunizing infectious diseases during the pre-vaccination era [2, 4, 5, 6]. The TSIR model is a discrete time version of the continuous compartmental Susceptible-Infectious-Recovered (SIR) type epidemic models based on mass-action transmission [2].
The TSIR model makes an implicit assumption that the sum of (unimmunized) births and cases will be approximately equal over time. That is, each individual born will become infected (immunized) at some point over their lifetime and fast-scale deviations in the balance of cases and births reflect the non-linear dynamics of transmission and immunity, and seasonal variation [2]. For childhood infectious diseases, such as measles, mumps, and rubella, this is likely to be a fairly weak assumption and is supported by high adolescent seroprevalence in the pre-vaccine era [7]. A powerful property of the TSIR model, that derives from this assumption, is that it does not require incidence to be perfectly observed. The under-reporting fraction is encoded in the ratio of cumulative cases to cumulative unimmunized births [2, 6, 4].
Routine infant vaccination has been incorporated into the TSIR model by modulating the birth rate of unimmunized (susceptible) individuals. Births at each time step are discounted by the fraction immunized through vaccination after accounting for effectiveness [8, 9, 10, 11]. However, where vaccination takes place after birth, such as at 9 months of age for measles as recommended by the World Health Organization for countries where measles is common [12], a scenario arises in which an individual destined to be vaccinated may become infected in the interval following loss of maternal immunity and before vaccination age. Such an individual would be doubly counted in the TSIR model as both an immunized birth and a case, leading to the violation of the assumption that the sum of unimmunized births and cases equate each other over time.
In this study we show that this double counting in the TSIR model can lead to a bias in the estimation of reporting fractions and of disease burden in the presence of vaccination. We show that the magnitude of this bias increases with the coverage of routine vaccination and is stronger with higher birth rates. We also show that the bias does not impact the estimation of seasonality in transmission. We propose a method to correct this bias and evaluate its performance.
2 Methods
2.1 Dynamical model of measles
We generate daily time series of measles incidence using an age-structured stochastic compartmental model of measles transmission and vaccination, with a population stratified into 40 discrete age classes (24 monthly classes up to 2 years old; 1 class from 2 to 5 years old; 14 5-year classes from 5 to 75 years old; and 1 class of 75 years and older). Individuals in the model are either unvaccinated or vaccinated. Within each vaccination status, they may be temporarily immune with maternal antibodies (M), susceptible (S), infectious (I), or recovered (R).
Individuals are born either susceptible or temporarily immune with maternal antibodies. The fraction, q(t), of the birth cohort at time t that is born with maternal antibodies is determined by the proportion of individuals between 10 to 55 years old that is immune, either through prior infection or vaccination [13]. We assume a constant death rate for all age classes and set it equal to the birth rate for a constant population size. We used exponential rates of aging every 15 days. We assume maternal immunity to last an average of 4 months and lifelong immunity following infection or vaccination.
We use an age-dependent rate of routine vaccination, shaped as a truncated normal distribution with mean of 9 months and standard deviation of 0.5 months, constrained to the interval of 0 and 24 months. The height of the normal distribution is scaled by a constant factor such that the cumulative probability of vaccination by 24 months of age is equal to the routine vaccination coverage for a given simulation. We do not model a second dose of vaccination. Individuals who are maternally immune, susceptible, or recovered may become vaccinated. However, the vaccine is only effective for susceptible individuals. It has no effect on those who are maternally immune or who have recovered. We assume 100% vaccine effectiveness.
For simplicity, we present the model with the following set of ordinary differential equations:
The force of infection for age class i is
where β is the transmission coefficient; ci,j is the contact rate from age group j to age group i; β0 is the relative amplitude of seasonal forcing; and ϕ is the phase shift. For the stochastic model and the full transition matrix, see the Supplementary Information.
The transmission coefficient is calculated with the next-generation matrix using a basic reproduction number, R0, of 15 [14, 15] and an infectious period of 14 days [2, 4]. Case importations occur with a probability 0.003 at all time steps to maintain endemicity. We assume either homogeneous contact mixing or assortative mixing following the pattern for Kenya [16].
We simulated the system in daily time steps using the tau-leaping algorithm [17], starting with 5 infected individuals in a fully susceptible population of 1 million with an age distribution consistent with Kenya [18]. The number of events is simulated as a Poisson draw with mean ki, where ki is the rate of event i per time step. If the Poisson draw is higher than what is feasible, the number of events is set to the maximum allowable number.
We simulated 200 years of time series at a given vaccination coverage following a discarded initial transient period of 500 years (300 years without vaccination and 200 years with vaccination at the given coverage) to reach equilibrium. We considered vaccination coverage ranging 0–95% and multiple birth rates corresponding to that in Ethiopia and Chad (30 and 40 births per 1000 persons per year respectively [19]), amplitude of seasonal forcing (0.3 and 0.4), and phase shift (Day 0 and 180 of the year). For each setting, we divided the time series into 20 individual, non-overlapping 10-year time series and assumed that cases were observed at reporting fractions of 0.01, 0.05, 0.10, and 0.20 simulated as binomial draws from the true incidence at each time step. For each 10-year time series of observed cases, we fit the TSIR model (described in the next section) with and without correction for the double counted cases. We present the estimates of the reporting fraction, the reconstructed 10-year burden of measles, the estimated seasonal amplitude, and the estimated phase shift.
2.2 Fitting the TSIR model
The discrete TSIR model is described by the following equations [2, 4]:
where B(t − d) represents births; d is the average duration of immunity derived from maternal antibodies; and t represents biweekly time steps (consistent with the infectious period for measles [2, 4]). The parameter p(t) is a time-varying transmission parameter with a 1-year (26 biweeks) period represented by
The coefficients α1 and α2 are cooperativity parameters, where α1 = α2 = 1 gives the standard mass-action transmission [20]. The parameters β0 and ϕ are the amplitude and phase shift respectively.
The estimated disease burden can be obtained by fitting I(t) of the TSIR model by first estimating the reporting fraction, r. We start with the simulated case data and estimate I(t) through its relation with the number of reported cases C(t) by
The reporting fraction, r, can be estimated using the linear regression relationship between cumulative cases and cumulative births, assuming births are known through available demographic data.
Unimmunized births, B(t), are calculated as a known constant birth rate multiplied by 1 − V, where V is the product of the fraction vaccinated and effectiveness, as used in other works [8, 11, 9]. Note that while we assume perfect effectiveness, these results are robust to any effectiveness as long as V is assumed known.
2.2.1 Proposed correction to the double-counted cases
An individual may become infected between the time of loss of maternal immunity and vaccination and present as a case. The TSIR model would double count this individual as both an immunized birth and a case. We propose the following to correct these double-counted cases. When calculating the regression of cumulative cases against cumulative unimmunized births to estimate the reporting fraction, we subtract those cases that would be double counted. From each time step we subtract the cases that occur below the mean age of vaccination (9 months in our model), multiplied by the vaccination coverage (i.e., when coverage is low, fewer of these young cases would result in a double count). The adjusted time series of cases, C*(t), at age a is then
and the correction to Equation (4) becomes:
2.3 Fitting the amplitude and phase shift
We estimated the seasonality parameters (amplitude and phase shift) by fitting the the expanded transmission equation (2)–(3)
We begin by reconstructing S(t − 1) in Equation (1) by letting
where Z(t) represents the residuals after regressing cumulative births and cumulative cases; and
is inferred by maximizing the profile likelihood in a generalized linear model for Equation (2) without s easonality. Next we use a non-linear least squares to fit E quation (6) f or fi ve pa rameters: α1, α2, β, β0 an d ϕ. Th e first three parameters allow non-linear dynamics, and the latter two are our estimated seasonality parameters of interest: amplitude of seasonal forcing and phase shift.
Minimum bounds are set to (α1, α2, β, β0, ϕ) = (ϵ, ϵ, ϵ, 0, 0); the maximum bound for all parameters is ∞; and the initial starting point is (α1, α2, β, β0, ϕ) = (1, 1, ϵ, 0.5, 0).
In addition we estimate the reporting fractions under the scenario of increasing vaccination coverage (instead of a fixed coverage at equilibrium) with and without the proposed correction for double counted cases. We simulate 20 independent time series with increasing vaccination coverage from 60% to 90% over a 20-year period at an increasing coverage of 1.5% per year for birth rates ranging 10 to 80 births per thousand per year. This time series begins at 60% coverage, following a discarded initial transient period of 500 years (300 years without vaccination and 200 years with 60% vaccination coverage). For this we use only a single amplitude of 0.3 and phase shift of 0 days.
3 Results
For time series at equilibrium without vaccination, the estimated reporting fractions are unbiased for both birth rates and all combinations of true reporting fraction, amplitude, and phase shift, as expected (Figure 1A, G). At high vaccination coverage, however, there is a positive bias using the standard TSIR model fit for all combinations of reporting fractions, amplitude and phase shift (Figure 1 C–F, H–L). The bias is larger under a higher birth rate as a larger fraction of cases occur in young children below the age of vaccination (Figure 1). The biggest absolute error of the estimated reporting fractions using the standard TSIR model fit is observed at 9 0% vaccination coverage for a true reporting fraction of 0.20, estimated 0.278 and 0.246 for birth rates of 40 and 30 per 1000 respectively (Figure 1E, K, shown in red). This bias is also observed under an assortative (Kenya-like) contact mixing structure, under which violates the implicit assumption of the TSIR model (Figure S1). For example, at 60% vaccination coverage for a true reporting fraction of 0.10, the estimated reporting fractions are 0.154 and 0.117 under assortative and homogeneous contact mixing respectively (Figure S1).
Estimated reporting fractions using the standard TSIR model for 40 (A–F) and 30 births (G–L) per 1000 persons per year and varying vaccination coverage (VC). Colors indicate different reporting fractions; vertical lines indicate simulated truth and symbols indicate mean and range (horizontal lines) of estimates from 20 simulations for different amplitudes (fill) and phase shifts (shape).
The estimated annual burden of measles is plotted against the true annual burden (Figure 2), forming an elongated shape along the line of unity to indicate years of low and high measles burden as in biennial dynamics. As expected, the total annual burden decreases as vaccination coverage increases. As a consequence of the positive bias in the reporting fractions, the estimated annual burden is negatively biased (Figure 2). This bias is observed for both birth rates and all chosen reporting fractions, amplitudes, and phase shifts. A regression of the estimated annual burden of measles against the true annual burden of measles shows underestimation at higher vaccination coverage, with the slope of the linear regression at a minimum of 0.77 at 90% and 95% coverage (Figure 2 E, F). At lower vaccination coverage, the burden is slightly overestimated from the truth, as observed by a slope of regression slightly above 1. This is due to a slightly underestimated reporting fraction at low vaccination coverage.
The estimated annual burden of measles using the standard TSIR estimator of the reporting fraction for 40 (A–F) and 30 (G–L) births per 1000 persons per year and vaccination coverage ranging 0– 95%. Diagonal lines are the line of unity (y = x; solid) and the regression of the estimated and true measles burden (dashed). Colors differentiate reporting fractions (0.01, 0.05, 0.10, and 0.20). Each reporting fraction shows 80 estimates (20 annual estimates per combination of amplitude and phase shift).
Correcting for the double-counting of cases occurring before the age of vaccination removes the positive bias in the estimate of reporting fraction (Figure 3). The biggest absolute difference of the estimated reporting fraction to true reporting fraction is now −0.03 with the correction (at 95% vaccination coverage at the higher birth rate; Figure 3F), compared to a maximum difference of 0.078 without correction. These reporting fraction estimates align more closely to the truth for both birth rates.
Estimated reporting fractions using the corrected TSIR estimator of the reporting fraction for 40 (A–F) and 30 (G–L) births per 1000 persons per year for vaccination coverage (VC) ranging 0–95%. Colors indicate different reporting fractions; vertical lines indicate simulated truth and symbols indicate mean and range (horizontal lines) of estimates from 20 simulations for different amplitudes (fill) and phase shifts (shape).
Using these estimates of reporting fraction with the proposed correction for the double-counting of cases also removes the negative bias in the estimated annual burden of measles (Figure 4). The regression of the estimated and true annual burden using the corrected TSIR estimator now shows an overestimation for higher vaccination coverage (Figure 4). For a birth rate of 40 per thousand per year, at 90% coverage with the corrected TSIR estimator, the estimated annual burden is now higher than the true annual burden by 5% on average (Figure 4E). This compares with the estimated burden using the standard TSIR estimator (without the correction), where at 90%, the estimated annual burden is smaller than the true annual burden by 23% on average (Figure 2E).
The estimated annual measles burden using the corrected TSIR estimator of the reporting fraction for 40 (A–F) and 30 (G–L) births per thousand persons per year and vaccination coverage ranging 0–95%. Diagonal lines are the line of unity (y = x; solid) and the regression of the estimated and true measles burden (dashed). Colors differentiate reporting fractions (0.01, 0.05, 0.10, and 0.20). Each reporting fraction shows 80 estimates (20 annual estimates per combination of amplitude and phase shift).
The estimated seasonality parameters are plotted against the true seasonality parameters using the standard TSIR estimator (without the proposed correction) in Figure 5 for the higher (A–L) and lower (M–X) birth rates. Despite the bias in the estimated reporting fraction and incidence, these estimates are unbiased for all 4 reporting fractions (Figure 5). Using the corrected TSIR estimator also does not affect the estimate of the amplitude and phase shift (Figure 6). However, the estimated amplitude is poor at 0.01 reporting fraction (blue) due to many zeros in the reconstructed incidence. The frequency of zeros make up as much as 60% of the reconstructed time series at 0.01 reporting fraction, compared with less than 10% of the same at 0.20 reporting fraction (Figure S2). A true reporting fraction of 0.05 and higher gives reasonably accurate estimated amplitude and range.
Estimated amplitude and phase shift using the standard TSIR estimator of the reporting fraction for 40 (A–L) and 30 (M–X) births per 1000 persons per year and vaccination coverage ranging 0–95%. Each panel shows the mean (marker) and range (vertical lines) over 20 ten-year intervals. Colors differentiate the 4 chosen reporting fractions.
Estimated amplitude and phase shift using the corrected TSIR estimator of the reporting fraction for 40 (A–L) and 30 (M–X) births per 1000 persons per year and vaccination coverage ranging 0–95%. Each panel shows the mean (marker) and range (vertical lines) over 20 ten-year intervals. Colors differentiate the 4 chosen reporting fractions.
Realistically, few settings in the world have long-term equilibrium dynamics at a constant level of vaccination. The bias in the estimated reporting fraction that occurs at equilibrium is also present in transient time series dynamics and increases with birth rate (Figure 7). In the transient time series, vaccination coverage is increasing from 60% to 90% over 20 years (Figure 7). For a true reporting fraction of 0.20, the estimated reporting fraction without using the proposed bias correction is 0.211 at 10 births per 1000 and grows to an estimated 0.305 at 80 births per 1000 (Figure 7). Using the proposed bias correction overcomes this bias in the estimate of the reporting fraction for all birth rates and reporting fractions (Figure 7).
Estimated reporting fractions for simulations with increasing vaccination coverage from 60% to 90% over 20 years, at an increase of 1.5% coverage per year for reporting fractions (0.01, 0.05, 0.1 and 0.2; vertical lines) and different birth rates.
4 Discussion
We have illustrated that a standard simplifying assumption of the TSIR model, that the effect of vaccination can be represented as a reduction in the rate of susceptible recruitment, leads to a positive bias in the estimate of the reporting fraction and a negative bias in the estimate of the true incidence. When vaccination is not given immediately after birth, as is common for vaccine-preventable diseases such as measles, mumps, and rubella, then the standard assumption to the TSIR model to accommodate vaccination may lead to double-counting of young cases as immunized births. The magnitude of this bias will increase when birth rates are high (i.e., there are numerically many pre-vaccine age children to become infected) and when vaccination rates are high (i.e., a larger fraction of pre-vaccine age children who become infected would likely go on to be vaccinated). We proposed a method to correct for double counting and show that the correction led to a more accurate estimation of the reporting fraction and the incidence of infection. We also show that the estimation of seasonality parameters was not affected by the bias in the estimate of the reporting fraction. We note that the TSIR model was originally developed for application in a vaccine-free setting [2]. Thus, the bias we have identified is not a flaw of the model itself. Rather, it arises from the naive extension to settings with vaccination. Further, many applications of the TSIR model to fit observed time series have relied on time series that are not disaggregated by pre- and post-vaccine age individuals. The bias correction that we have proposed here requires the availability of age-specific case data, at least grouped as pre- and post-vaccine age. Where such data are not available, this analysis serves as a caution on interpreting the estimates of reporting fraction and incidence of infection in settings where the effect of the bias may be high.
The simplicity of the standard TSIR model and its ease of application for under-reported time series have made it an enduring tool for analysis of time series data even into the vaccine era [8, 11, 21, 10]. The bias identified here should be a cautionary note that simplifying assumptions, while often practically necessary, should be revisited as the quality and resolution of data increase.
Data Availability
All data produced in the present study are available upon reasonable request to the authors.
Supplementary Information
Model description and equations
We used an age-structured discrete-time stochastic transmission model that incorporates both epidemiological and demographic transitions, building on a framework introduced by Klepac and Caswell [22] and Klepac et al. [23]. We describe the model structure following the notation of Metcalf et al. [24] and Winter et al. [25]. The model population is structured into an unvaccinated group and a vaccinated group. Individuals within each vaccination status group are further divided in epidemiological classes (temporarily immune with maternal antibodies, M, Mv; susceptible, S, Sv; infectious, I, Iv; recovered R, Rv) where the unvaccinated group is without subscript, and the vaccinated group is denoted by subscript v. The population is stratified into 40 discrete age classes (24 monthly classes from ages 0 to 2 years; 1 class from age 2 to 5 years; 14 5-year classes from age 5 to 75 years; and 1 class of age 75 years and older). At every time step a large transition matrix defines transitions from every possible epidemiological stage and age class combination to every other possible epidemiological stage and age class combination.
The transition matrix is described in two steps. First ignoring demography (aging and survival), we define matrix Aa,t, which describes the epidemiological transitions within each age class a and discrete time-step t (set to 1 day), as
The eight rows and columns represent the stages M, S, I, R, Mv, Sv, Iv, Rv, respectively. In the transition matrix, da is the probability an individual in age class a loses maternal immunity; ϕa is the probability an individual in age class a becomes infected; γ is the probability of recovery; and νa is the probability an individual in age class a is vaccinated. A susceptible individual is successfully vaccinated with probability VE, reflecting the vaccine efficacy. Vaccination has no effect on individuals with maternal immunity or recovered individuals. The infection probability ϕ (also called the force of infection) is a function of n(t), a vector describing the population at time t
according to
where z is the total number of age classes (z = 40 here); βa,j,t is the transmission rate between individuals in age class a and j at time-step t; and Ij,t + Iv,j,t is the number of infected individuals in age class j at time-step t.
Transmission to individuals in age class a from individuals in age class j at time t is defined by βa,j,t = βca,j (1 + β0 cos (2πt/365)), where β is the transmission coefficient calculated using the next-generation matrix; ci,j is the mean contact rate between individuals in age class j to i; and β0 is the relative amplitude of seasonal fluctuations. The transmission coefficient was calculated with the next-generation matrix using a basic reproduction number R0 of 15 [14, 15] and an average infectious period of 14 days [2, 4].
Second, we define the full transition matrix, A(n(t)), that includes both epidemiological transitions from matrix Aa,t and demographic transitions (aging and survival). This matrix is used to project the entire population forward via aging, mortality, and infection dynamics according to:
where sa is the probability that an individual in age class a survives; ua is the rate of aging out of age class a; and A1,t, A2,t, etc., are as defined in matrix Aa,t. The dynamics of the total population are then projected forward in time, such that
where Bt is a vector representing the number of births at time t, defined as,
Supplementary figures
Estimated reporting fractions under homogeneous and assortative (Kenya-like) contact mixing consistent for different vaccination coverage (VC) and a high birth rate (40 births per thousand). Colors indicate different reporting fractions; vertical lines indicate simulated truth, and symbols indicate mean and range (horizontal lines) of estimates from 20 simulations.
Frequency of zeroes in the reconstructed time series of incidence for different estimated reporting fraction and estimated amplitude for 20 ten-year time series. True reporting fractions are 0.01 (blue), 0.05 (yellow), 0.10 (green), and 0.20 (red). Here the true amplitude is 0.3 with 0 phase shift.