Evaluating the use of social contact data to produce age-specific forecasts of SARS-CoV-2 incidence
===================================================================================================

* James D Munday
* Sam Abbott
* Sophie Meakin
* Sebastian Funk

## Abstract

Short-term forecasts can provide predictions of how an epidemic will change in the near future and form a central part of outbreak mitigation and control. Renewal-equation based models are increasingly popular. They infer key epidemiological parameters from historical epidemiological data and forecast future epidemic dynamics without requiring complex mechanistic assumptions. However, these models typically ignore interaction between age-groups, partly due to challenges in parameterising a time varying interaction matrix. Social contact data collected regularly by the CoMix survey during the COVID-19 epidemic in England, provide a means to inform interaction between age-groups in real-time.

We developed an age-specific forecasting framework and applied it to two age-stratified time-series: incidence of SARS-CoV-2 infection, estimated from a national infection and antibody prevalence survey; and, reported cases according to the UK national COVID-19 dashboard. Jointly fitting our model to social contact data from the CoMix study, we inferred a time-varying next generation matrix which we used to project infections and cases in the four weeks following each of 29 forecast dates between October 2021 and November 2022. We evaluated the forecasts using proper scoring rules and compared performance with three other models with alternative data and specifications alongside two naive baseline models.

Overall, incorporating age-interaction improved forecasts of infections and the CoMix-data-informed model was the best performing model at time horizons between two and four weeks. However, this was not true when forecasting cases. We found that age-group-interaction was most important for predicting cases in children and older adults. The contact-data-informed models performed best during the winter months of 2020 - 2021, but performed comparatively poorly in other periods. We highlight challenges regarding the incorporation of contact data in forecasting and offer proposals as to how to extend and adapt our approach, which may lead to more successful forecasts in future.

## Introduction

Effective epidemic response relies on accurate infection surveillance to provide status updates which support decision makers[1]. Surveillance data can be enhanced by estimating key epidemiological parameters in real-time such as the growth rate and time-varying reproduction number (*R**t*) and by generating short-term-forecasts of incidence of infection, hospitalisation and mortality[2–4]. These provide estimates of the current and future epidemic trajectory to public health decision makers. As such a host of approaches have been developed to make short term epidemiological forecasts. A popular genre of methodology for infectious disease forecasts are renewal-equation based ‘semi-mechanistic’ models [2,4–6], which infer key epidemiological parameters from historical time-series data, in particular changes in transmission intensity, and use them to forecast future epidemic dynamics without requiring the more detailed assumptions and complex mathematical framework involved in ‘fully-mechanistic’ models (e.g. compartmental or agent based models).

Age has been shown to be an important factor in both transmission risk [7,8] and severity of disease [9–11] caused by SARS-CoV-2. This is not unique to the COVID epidemic. In the past, epidemiological analysis and modelling have shown variability and homophily in transmission by age to have important implications for the dynamics of infection[12–15]. Moreover, age distribution of infection has important implications for the potential burden of disease as infection moves between age groups, who are more and less prone to severe illness and death[7,8,16].

Although age-specific forecasts are desirable to better understand the risk to particularly vulnerable groups, due to variance in prevalence of infection between age-groups, accurate estimates of future incidence might require risk of transmission between age groups to be captured effectively. However, the high dimensionality of the problem means that estimating an age-interaction matrix is not possible from epidemiological data alone [17]. Instead, much infectious disease dynamics research in the past 30 years has made assumptions in line with the social contact hypothesis [17]. It states that the rate of transmission of directly infectious agents is proportional to the population-level rate of social contact between population groups. This hypothesis is the basis for age-structured mixing assumptions in many mathematical models. Such models are generally parameterised from data gathered in social contact surveys[15,18], which typically ask participants to report their social contacts from a fixed period in the recent past, e.g. the last 24 hours. Participants are also asked about the characteristics of their contacts at each contact event, usually including age[15]. A key challenge to the use of historically collected contact data has been of their temporal and geographical generalisability. This is especially true when non-pharmaceutical interventions (NPIs) are in effect, potentially drastically changing the contact behaviour of the general public. The variability in behaviour with time and age during a pandemic makes parameterisation of age-specific-real-time-models particularly challenging as up-to-date information on interaction is essential for time-varying parameterisation of the model.

During the COVID-19 pandemic, as a means to monitor the behaviour of the general public relevant to transmission and provide insight into risk posed to vulnerable populations, a number of studies were conducted to survey social contacts at a frequency and scale not seen previously. One example is the CoMix study, which collected contact data weekly between March 2020 to March 2022 in 19 European countries[19–22]. The UK arm of the study, which involved a survey of greater than 5000 participants, was the first to launch and most complete in terms of data collected over this period. This regularly collected contact data provides a means to parameterise models with temporally and geographically relevant estimates of social interaction, and an opportunity to evaluate how incorporating such data into a real-time analysis framework impacts forecast performance at different scales.

Existing studies of forecasting performance[5,6,23] have focused on age-agnostic numbers of cases, hospitalisations and deaths. Probabilistic forecasts can be robustly assessed using proper scoring rules [24]. Although these methods have been popular for some time in other fields, they have only recently been applied to epidemic forecasts[5,6,23]. To the authors knowledge one such evaluation has previously been made [25] of age-stratified epidemic forecasts however, the study by Held et. al. used historical contact data to parameterize interaction between age-groups and evaluated at a population level by summing age-specific forecasts. To our knowledge there has been no evaluation of the use of the regularly collected age-stratified contact data in comparison to other approaches to make short term forecasts at an age-group specific level.

Here we present age-specific forecasts in the UK, with the aim of understanding whether incorporating the weekly collected social contact data improves the predictive ability compared to ignoring this interaction. We incorporated data from the CoMix study in a semi-mechanistic forecasting framework and applied this to case numbers, as the most commonly tracked metric for COVID-19 dynamics in the UK throughout the pandemic. We further applied it to infection incidence estimated from a weekly cross-sectional household survey of infection [26,27] in order to better understand the influence of reporting patterns on results. To quantify the relative benefits of incorporating interaction between age groups and specific contact data into forecasts we compared three models with interaction between age groups with an equivalent model with no such interaction and evaluated the models against two naive baseline models.

## Materials and Methods

### Study overview

To establish the relative benefit of incorporating interaction between age-groups in short-term epidemiological forecasts, we implemented four age-stratified semi-mechanistic models, which each estimate a time-varying Next Generation Matrix (NGM). This matrix is inferred as the interaction matrix between age-groups under the assumption that all infections in each age group are informed by the sum of past infections in all age-groups weighted by the distribution of time between infections - the generation interval distribution and the NGM. Two of the models included interaction between age groups, one of which was informed by contact data from the CoMix study (regularly collected during the period of study). We also evaluate the same model using data collected in the POLYMOD survey (single survey performed in 2008). In the fourth model the interaction was estimated entirely from historical epidemiological data. We compared these models with a fourth model which allowed no interaction between age-groups.

We applied this to reported cases, as a commonly available quantity for forecasting epidemic dynamics[5]. This, however, incurs a secondary challenge due to potential variability in reporting of cases by age and over the course of an epidemic, which may serve to complicate our interpretation of the application of contact data to forecasts. Hence, to isolate the impact of incorporating contact data we chose to additionally apply the models to estimated infection incidence from a repeated cross-sectional household survey of infections.

We forecast weekly reported cases using the data from the UKHSA Covid-19 dashboard. For convenience we used the full case time-series aggregated to weekly incidence and truncated at different forecast dates, rather than the data available on each forecast date. Although this does not give a full picture of the real-time applicability and performance of the model, it avoids complications in delays in gathering case reports which require additional treatment prior to application of a forecasting model such as truncation of the most recent data or now-casting[28]. Secondly, we applied the models to estimates of weekly incidence of infection estimated[26] from national infection prevalence data, again with the full final data set truncated at each forecast date rather than snapshots available at the time. To further isolate the role of contact data in the forecasts of infections, we used weekly age-stratified estimates of antibody prevalence to inform age-specific susceptibility.

### Data

We accessed daily, age-stratified, case data from the UK COVID-19 dashboard [29] on 11th May 2022. We aggregated this data to weekly incidence by taking the 7 day sum from the previous 7 days, aligned such that the weekly data is reported on the proposed forecast dates, to forecast weekly case counts in future weeks. The case reports were grouped in seven decade groups between zero and 69 years with a single group for over 70 year olds (0-9, 10-19, …, 60-69, 70+).

We accessed aggregates of SARS-CoV-2 infection prevalence and antibody prevalence collected as part of the COVID-19 infection survey(CIS) through the CIS Website[27] on 18th March 2022. We used data covering the period between August 2020 and January 2022 to estimate weekly infection incidence and antibody prevalence for seven age-groups (2-10, 11-15, 16-24, 25-34, 35-49, 49-69 and 70+). In addition to the CIS data, we used vaccination data published by the National Health Service and accessed via the UK coronavirus dashboard[30] on the same date.

We generated SARS-CoV-2 infection incidence and antibody prevalence time-series for the period between August 2020 and January 2022 using an approach described elsewhere[26,31](Figure 1). To establish a weekly time-series of infections we took the sum of incident infections in each week on a sample-by-sample basis and calculated the credible intervals from the resultant sum. To establish a weekly time series of antibodies we took the antibody prevalence on the last day of each week and calculated credible intervals from the full posterior sample.

![Fig. 1.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2022/12/03/2022.12.02.22282935/F1.medium.gif)

[Fig. 1.](http://medrxiv.org/content/early/2022/12/03/2022.12.02.22282935/F1)

Fig. 1. 
Estimated incidence of A) infection, B) antibody prevalence and C) case reports by age-group.

We combined these data with social contact data collected as part of the CoMix social contact survey (CoMix)[19,32], a multinational, weekly, cross-sectional survey of social contacts. We used published weekly contact matrices from the UK arm of CoMix, generated under the framework described previously[20].

### Transmission Model

We extended the concept of the Next Generation Matrix to include transmission interval distributions (the generation interval for infections, and the interval between a positive test in infector and infectee for cases). Here, the number of incident cases or infections *I*(*t*) at time was given by the sum of the products of the next generation matrix N and the age-stratified vector of cases or infections on dates between *t* − *s**max* and *t* − 1, weighted by the transmission interval distribution *w*(*s*). ![Formula][1]</img>  where *S**max* is a fixed upper-limit of the transmission interval distribution, set to 4 weeks and *w*(*s*) is assumed to follow a discretised log-normal distribution with time since the primary event (infection or positive test of the infector): ![Formula][2]</img>  where *F**LNorm* is the cumulative distribution function of the log-normal distribution with parameters *w**μ* and *w**σ*. Under the social contact hypothesis[17], the next generation matrix is calculated by multiplying the contact matrix, *C*(*t*) quantifying the mean number of contacts between age groups, with vectors of age-specific susceptibility (s) and infectiousness (i), where each element, *c* and *i**a* give the specific susceptibility and infectiousness of age group *a* [13]. ![Formula][3]</img>  

We assumed that age specific infectiousness, i, is inherent and unrelated to time varying factors associated with the epidemic. We assumed that age-specific susceptibility included two components: ![Formula][4]</img>  

The first (*s**ab*) is drawn from age-specific immunity to infection, which is informed by age-specific antibody prevalence. We used a leaky definition of antibody effectiveness in line with the definition used in the estimation of the infection and antibody timeseries: ![Formula][5]</img>  

Where *Φ* is the effectiveness of antibodies in preventing infections in an exposed member of the population. The second component (*s**inh*) is due to an age-correlated variation in inherent susceptibility to infection and unrelated to time-varying factors associated with the epidemic. Both *s**inh* and i were assumed to remain constant in time, such that all of the variation in the next generation matrix by time is governed by changes in contacts and estimated antibody derived immunity. Both *s**inh* and i were fit as random effects in a hierarchical framework (Table 1).

View this table:
[Table 1.](http://medrxiv.org/content/early/2022/12/03/2022.12.02.22282935/T1)

Table 1. Model parameters and priors

### Parameter estimation and forecasting

To allow variation in parameter values over the course of the study period, we parameterised the model with the estimated antibody prevalence and contact matrices and fitted it to 8 weeks of weekly estimated infection or reported case data prior to the forecast date. We fitted the model using Hamiltonian Monte Carlo, implemented in the Stan probabilistic programming language[33] assessing the convergence of each model by monitoring transition divergence and chain mixing.

We fitted to the mean infection time series *I**μ*(*t*) under the likelihood: ![Formula][6]</img>  

Where ![Graphic][7]</img> is the overall uncertainty in the modelled infections, combining the time-varying inherent uncertainty in the NGM model, *σ**i*, and the standard deviation of the infection estimates, *I**σ*, ![Formula][8]</img>  

*σ**i* is constructed at each time point from the estimated coefficient of variation and infection incidence such that: ![Formula][9]</img>  which ensures the uncertainty scales with the magnitude of the infection incidence estimates. We fit to the case time series *c*(*t*) under the likelihood: ![Formula][10]</img>  Where *σ**c* is the modelled uncertainty in cases and is constructed from the estimated coefficient of variation *CV**c* at each time point such that: ![Formula][11]</img>  which ensures the uncertainty scales with the magnitude of the reported case incidence. To incorporate the contact data in the CoMix based model we jointly fit the contact matrices under the likelihood: ![Formula][12]</img>  ![Formula][13]</img>  and *C**α,ab* is the estimated standard deviation of the measured contact rate and *σ**cm* is the uncertainty in the fitted contacts.

Each of the models estimated a NGM which varied over the 8 weeks of prior data only by changes in the estimated contact matrices and antibody inferred immunity, whilst the inherent susceptibility and infectiousness vectors were assumed constant for the whole modelled period. However, as each forecast date was modelled independently, all parameters were able to vary between forecasts.

We used uninformative priors (Table 1) for the contact rate between each pair of age groups (*C**ab*), model uncertainty parameters (*CV**I*, ![Graphic][14]</img>) and antibody protection (Φ). Antibody prevalence priors were set to the distribution of the estimate provided by the model used to estimate incidence (*inc2prev*)[31] and relative susceptibility and infectiousness vector elements were set such that the Secondary Attack Rate (SAR) was roughly half that of estimates of Household SAR in literature[34]—which aimed to account for reduced risk of transmission to known contacts outside the household. The prior for the log-mean (*w**mu*) and log-standard-deviation (*w**sigma*) of the transmission interval had a mean and standard deviation of 5 days to reflect the broad distribution or transmission intervals recorded in literature [35–37], these were converted to the appropriate log-parameters for the log-normal framework in equation 2, and their prior was set to be normally distributed with a standard deviation of 20% of the mean.

We used posterior distributions of the parameters (Table 1) to project infections and cases forwards up to four weeks after each forecast date. We note that contact data directly relevant to the dates forecasted would not be known on the forecast date, so we used the contact data corresponding to the week of the forecast date itself, assuming that these also reflected contacts in the following week. For the case forecasts we offset the contact data by 7 days to account for delay between infection and specimen date and used the generation interval as a proxy of the test-to-test distribution[38], which is consistent with a 5 day incubation period and a 2 day report delay[39].

### Model evaluation

We evaluated the performance of the NGM models (CoMix-data, No-contact-data and No-interaction) across 29 forecast dates between October 2020 and December 2021. We chose this period as there was major disruption to the CoMix survey during July 2020 and following changes in the survey in June 2020. We excluded dates after December 2021 due to the complication of the emergence of the Omicron variant, which has been shown to evade immunity to a greater extent than earlier variants[40], complicating our interpretation of antibody prevalence as a mix of omicron-specific and previously acquired antibodies persist in the population.

We evaluated the forecasts against the reported number of cases or mean estimated number of infections in the week forecasted, for case and infection forecasts respectively. We evaluated the forecasts based on Continuous Ranked Probability Score (CRPS) and a measure of bias (see appendix for definitions) each implemented using the *scoringutils* R package [41]. The CRPS measures the ‘distance’ of the predictive distribution to the observed data-generating distribution, hence a lower score indicates more accurate predictions and therefore a higher performing model. The bias measures the tendency for a model to over (positive value) or under (negative value) predict the incidence in its projections, hence a bias of zero is optimal.

We also assessed the models calibration by evaluating the central interval coverage (coverage) of each model’s forecast (Proportion of incidence points which fell in the ranges projected by the forecast model’s posterior distribution of future cases).

To provide a comparator as a lower bound of performance, we also evaluated two baseline models. These baselines were intended to represent naive assumptions, which may be applied without the use of a model. The first baseline assumed no change in incidence from the day the forecast was made. The second calculated the change in incidence between the forecast date and each week within the four week forecast horizon, the rate of change is projected as an exponential extrapolation based on the previous two weeks of data. Both baselines were modelled without uncertainty and, consequently, the CRPS reduced to the mean absolute error. To provide a clear comparison of performance with and without interaction between age-groups, we provide all CRPS scores relative to the score of the no-interaction model (rCRPS).

As well as the overall performance of each forecasting model, we also evaluated the forecasts by grouping forecasts made by each model in two ways. Firstly, we aggregated the forecasts by age-group—showing the relative performance of the models when forecasting incidence in particular age categories. Secondly, to evaluate how performance changed over the course of the analysis period, we scored the forecasts separately for seven key phases of the pandemic (Table 2). For this we used the phases used in Gimma, et. al. [19] which overlapped with our analysis, with the addition of two phases that were not covered by the previous CoMix work. Due to the small number of weeks covered by ‘Christmas’ and ‘Lockdown 3 schools open’ we combined these with ‘Lockdown 3’ and ‘Lockdown 3 Easing’ respectively.

View this table:
[Table 2.](http://medrxiv.org/content/early/2022/12/03/2022.12.02.22282935/T2)

Table 2. Pandemic period names and dates

### Age-specific transmission parameters

Finally, to compare the implicit assumptions within the models we applied, we assessed how the values of the relative susceptibility and infectiousness parameters s and i varied over the pandemic. To provide an interpretable quantification of these parameters, we used the age-specific values estimated in the model to calculate ratios of susceptibility and infectiousness of younger adults, and older adults relative to that of children. Due to the different age-stratification in the data available, the broader age-bands here varied between case forecast models and infection forecast models: Children were defined as up to 15 for infections and up to 19 for cases, younger adults were defined as 16-49 for infections and 20-49 for cases and older adults were defined as over 50 in both instances.

## Results

### Forecasts

We made forecasts with a horizon of one, two, three and four weeks at fortnightly intervals (Figure 2, Figures S1 - S6) between 30th October 2020 and 26th November 2021 (29 forecast dates). Visually the forecasts deviate more from the true data at longer forecast horizons.

![Fig. 2](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2022/12/03/2022.12.02.22282935/F2.medium.gif)

[Fig. 2](http://medrxiv.org/content/early/2022/12/03/2022.12.02.22282935/F2)

Fig. 2 
A) and B) Infections and cases, respectively, forecast using the CoMix-data based next generation model, the no-contact-data, no-interaction and POLYMOD-data data based model, for each age group (top to bottom) and forecast horizon (left to right). projected infections from each model (coloured bars) and black points show infection estimates and reported cases in plots A and B respectively. The estimates being forecast on each axis are shown as solid points; those not being forecast are shown as rings. C) and D) show the continuous ranked probability score relative to the score of the “no interaction” model for each forecast date, calculated from the Infection and Case forecasts respectively.

### Model Evaluation

To assess the relative performance of each of the models for different forecast horizons, we calculated evaluation metrics separately for each forecast horizon across all forecast dates (Figure 3 A, Table S1). For an alternative approach using multivariate evaluation across age groups and time horizons, see Held et al. (2017) [25].

![Fig. 3](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2022/12/03/2022.12.02.22282935/F3.medium.gif)

[Fig. 3](http://medrxiv.org/content/early/2022/12/03/2022.12.02.22282935/F3)

Fig. 3 
continuous ranked probability score relative to the score of the no-interaction model. A. shows overall performance of each model when applied to case (left) and infection (right) data with rCRPS on the y-axis and Bias on the x-axis. Forecast horizon is indicated by marker size. B. and C. show the CRPS relative to the no-interaction model against forecast horizon disaggregated by age for infection and case data respectively. The colour of the points shows the corresponding model.

When forecasting case reports from the UK Covid-19 dashboard [29], the non-interaction model performed better than any of the models which allowed interaction. The next best model was the model with no contact data which performed similarly to the no-interaction model, particularly at longer time horizons with rCRPS of 1.02, 0.99, 1.01 and 1.01 (relative to the no-interaction model) respectively at horizons of one to four weeks in ascending order. The two models that incorporated contact data both performed poorly when considering the CRPS relative to the no-interaction model, with the Polymod model performing the worst. However, the relative performance of both models improved at longer time horizons, with the CoMix model performing similarly to the other NGM models at four week horizons (rCRPS of 1.53 and 3.78 at one week horizon reducing to 1.01 and 1.53 at four week horizon for the CoMix and Polymod data models respectively). Both CoMix and Polymod forecasts had a substantial positive bias, showing that, on average, they over-predicted cases with bias between 0.15 and 0.25. The other models tended to under predict cases by a smaller margin (between 0 and 0.18). The exponential baseline performed worse than the no-interaction model at all forecast horizons (rCRPS 1.35, 1.25, 1.11 and 1.09 at one to four weeks in ascending order). The fixed value baseline initially performed second to worst for one week forecasts (rCRPS=1.76) but improved as the horizon increased, eventually becoming the best performing forecast at four week horizons (rCRPS = 0.74).

When forecasting infection incidence estimated from UK prevalence survey data[27], the no-interaction model performed second only to the no-contact-data model (rCRPS = 0.89) for horizons of one week, followed closely by the CoMix-data model (rCRPS=1.05). The POLYMOD-data model performed worst when forecasting one week horizons with a rCRPS of 1.21. However, at two week horizons the non-interaction model became the worst performing model overall - which remained true for three and four week forecast horizons. In these cases the CoMix-model performed best of all the models including the baseline models with rCRPS of 0.68, 0.64 and 0.57 (relative to the no-interaction model) for two, three and four week horizons respectively. The second best performing NGM model at two and three week horizons was the no-contact-data model, rCRPS of 0.82, 0.85 respectively. At four week horizons the POLYMOD-data model was second best performing NGM model with a rCRPS of 0.76. The baselines both did worse than all but the POLYMOD-data model when forecasting at a one week horizon, however the performance of the fixed value baseline improved relative to all of the NGM models at longer forecast horizons and produced the second best performing forecasts overall for forecast horizons of three and four weeks (after the CoMix-data model) with rCRPS of 0.79 and 0.68 respectively.

We compared the relative forecast performance scoring predictions in each age-group separately (Figure 3 B and C, Table S2). When forecasting infection incidence, we found that the CoMix model and no-contact-data model forecast infections better than the no-interaction model in middle-aged adults and older adults (35+ years old) for all forecast horizons. The models also performed best at forecast horizons of two weeks or more in young children (2-10 years old) and older adults. In contrast, the non-interaction model performed much more similarly to the interaction models for forecasts within younger adults (16-34 years old). The same was also true for older age groups (60+) in the case forecasts but not for children, middle aged adults or children. For infections, the performance of all the models improved in all age categories relative to the exponential extrapolation baseline as forecast horizon increased, the fixed value baseline improved relative to the non-interaction model in all age categories but provided poorer forecasts than the CoMix model in all age-categories and time horizons. For case forecasts however the fixed value baseline improved relative to all of the models as horizon increased, providing the best forecasts at four week horizons in age groups between 0 and 59 years.

We divided the analysis into seven periods (Figure 4, Table S3), within each of which national restrictions on social activity remained broadly consistent. For consistency we used the same periods as those presented in Gimma et. al.[19], which presents the key findings of the CoMix study. The relative improvement in performance for the CoMix-data model was most consistent when forecasting infections during the *Christmas and Lockdown 3* period, which was the only period where the CoMix-data model performed the best overall at all forecast horizons. When forecasting cases, the CoMix-data model also performed relatively well during this period at forecast horizons of two or more weeks, but performed comparably to the no-interaction model at one week forecast horizon. The Comix model’s infection forecasts outperformed all other NGM models in the two periods following this (*Lockdown 3 easing* and *Opening up*) for forecast horizons of two weeks or more, where only the fixed value baseline model improved on its score. Similarly for the *Lockdown 2 easing* period, the CoMix-data model performed better than the no-interaction model at all forecast horizons, but the no-contact-data model performed better for one and two week forecast horizons. The improved performance of the CoMix model was not wholly reflected in the case forecasts. In particular, the CoMix model performed more poorly than the no-interaction model at all time horizons during the *Lockdown 3 easing* period. However the CoMix model performed better during the *Lockdown 2* period, than the other NGM models for case forecasts, whereas for infection forecasts the CoMix model performed comparably to the no-interaction model during this period.

![Fig. 4](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2022/12/03/2022.12.02.22282935/F4.medium.gif)

[Fig. 4](http://medrxiv.org/content/early/2022/12/03/2022.12.02.22282935/F4)

Fig. 4 
Continuous ranked probability score relative to the score of the no-interaction model against increasing forecast horizon (one to four weeks). Panels left to right show each pandemic period, Panels top to bottom show forecasts of cases and infections.

### Forecast calibration

Of the four NGM models, the CoMix data-based model was best calibrated for case forecasts at one and two week horizons and at all horizons for infection forecasts. This is evidenced by closer agreement between the proportion of true values in each central range of the predictive distribution (50% and 90%) with the value of the range (Figure 5 A and B). Calibration typically became poorer at longer forecast horizons, with more true values falling outside the specified ranges than would be expected. We also note that none of the forecasts were particularly well calibrated when considered overall forecast dates. The best performing forecast, infection forecasts made by the CoMix model at a one week horizon, saw fewer than 75% of true values fall within the 90% confidence range of the associated projections and fewer than 40% within the 75% confidence range. Separating the forecasts by period of the pandemic (Supplementary Figures S7 and S8) revealed that the CoMix model was best calibrated for ‘Christmas and Lockdown 3’ and ‘Lockdown 3’ periods, for both the case and infection forecasts. In particular the CoMix model’s forecast of infections was very well calibrated during the ‘Christmas and Lockdown 3’ period, with more than 80% of true values falling within the 90% confidence range of the forecast at all horizons. The other models were also relatively well calibrated during these periods. Overall the other periods were much more poorly calibrated. In particular, the “Lockdown 2” and “Lockdown 2 easing” periods were very poorly calibrated across all models with no true values falling within the 90% confidence range for the No-contact-data and No-interaction model forecasts during the ‘Lockdown 2’ period. The baseline models are not presented as they present no confidence ranges - hence the forecast coverage is zero for all forecasts by definition.

![Fig. 5](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2022/12/03/2022.12.02.22282935/F5.medium.gif)

[Fig. 5](http://medrxiv.org/content/early/2022/12/03/2022.12.02.22282935/F5)

Fig. 5 
Calibration of the forecasts made by each of the next-generation-matrix-based models. A) The proportion of observed mean incidence of infection estimates (inc2prev) and B) the proportion of observed case numbers that fall within the 50% and 90% central interval of the relevant forecasts of the four models. C) and D) the percentage of observed mean incidence of infection estimates (inc2prev) falling below each quantile of the forecasts at one and four week horizons respectively. E) and F) the percentage of observed case counts falling below each quantile of the forecasts at one and four week horizons respectively.

### Age-specific susceptibility and infectiousness

We extracted estimates of age specific infectiousness and susceptibility from the model fits to assess the biological plausibility of these parameters (Figure 6). For the CoMix model the susceptibility of younger adults (16-49 years old for infections and 20-49 for cases) and older adults (50+ years old) was higher relative to children (under 16 for infections and under 20 for cases). In the early part of 2021, this began to shift such that first susceptibility reduced relative to children in the older adults and then in younger adults. Ultimately by the end of the period evaluated, children had higher susceptibility relative to all adults. A similar pattern was present in all models that allowed interaction between age-groups. Infectiousness broadly remained equal between age-groups, with the exception of a small number of outlying values within which there is no clear trend.

![Fig. 6](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2022/12/03/2022.12.02.22282935/F6.medium.gif)

[Fig. 6](http://medrxiv.org/content/early/2022/12/03/2022.12.02.22282935/F6)

Fig. 6 
Infectiousness (top panels) and susceptibility (bottom panels) of younger adults (16-50 for infections and 20-50 for cases) against that of older adults (50+) each relative to children (under 16 for infections and under 20 for cases) by forecast date (hue). Each model is shown in panels left to right.

## Discussion

Evaluating the forecast performance of four next generation matrix models, we found that allowing interaction between age-groups and using regularly collected contact data did not consistently improve performance when forecasting cases as reported through the UK COVID-19 dashboard. However, when forecasting infection incidence estimated from prevalence survey data, forecast performance was improved overall by allowing interaction between age-groups at all time horizons. We found that informing interaction by regularly collected contact data improved forecasts further at time horizons of two weeks and greater. Although we found that the improvement was not consistent across all periods or when considering the resulting forecasts for each age-group separately.

The NGM models with interaction showed the most benefit over the no-interaction model when forecasting infections during the *Christmas and Lockdown 3* period. Here the CoMix-data model outperformed all other models. During this period the CoMix-data model also proved to be the best calibrated of any model during any period. It’s notable that the forecasts being made at a time where the most intense restrictions were imposed for an extended period of time following a very sharp rise in cases. The sharp rise in cases combined with growing hospitalisations and deaths may have resulted in a period of consistent behaviour amongst the population, since although restrictions changed on January 5th[42] the contact behaviour recorded by CoMix remained similar between *Christmas* and *Lockdown 3[19]*. This consistency of behaviour, well described by CoMix data, over an extended period of time may ultimately support the performance of this model over the others.

Generally, the NGM models that allowed interaction between age-groups performed better than the model with no interaction for infection forecasts. This was particularly true when considering performance in older and younger age-groups. This effect may relate to the age-specific incidence and transmission rates. Whereas infections in the younger adults groups were largely driven by transmission within the age-group, for long periods of the pandemic infections in elderly and children are likely to have been driven primarily by transmission from the younger adults age-groups, particularly when schools were closed, which was the case for a large proportion of the studied period[7,43]—making incidence projections in these groups more reliant on between-age-group interaction.

Overall, all forecasts performed better than the exponential extrapolation baseline when forecasting infections. The relative performance of this baseline compared to all other models generally worsened over time suggesting that the simplistic exponential growth model tends to overestimate any change in infections over time - which is compounded at longer horizons. Although the relative performance of the simple exponential extrapolation was better for case forecasts than infection forecasts at short time horizons, similarly to the infection forecasts, all models improved relative to this baseline as the forecast horizon increased, mostly surpassing it to provide better predictions at a longer horizon, showing that this simple assumption of transmission dynamics breaks down rapidly.

In contrast, in both case and infection forecasts, the relative performance of the fixed value baseline improved with increased forecast horizon as all of the modelled values deviated from the true values over time in the case of all models. This may represent the rapidly changing behaviour of the public under constantly changing interventions. This however, also compounds existing evidence that effective forecasts of infectious disease incidence can rarely be made at horizons of greater than a few weeks[6].

The distributions of relative infectiousness and susceptibility inferred by the models are consistent with others findings, beginning with adults exhibiting higher susceptibility than children in general[7,8,16]. This changes throughout the pandemic, following a sequence consistent with what may be expected as a result of acquisition of antibodies through vaccination and natural infection. The largest changes occur after vaccination is introduced, where the susceptibility of the older adults reduces relative to other ages first, followed by susceptibility of younger adults. This is consistent with the vaccine roll out schedule in England during the early part of 2021[44]. The general trajectory of age-specific susceptibility also agrees well with findings of Franco et. al.[44,45] which used the Belgian arm of the CoMix study to estimate age-specific infectiousness and susceptibility independently to this study.

Our estimates of age-specific infectiousness and susceptibility need to be interpreted with caution for three main reasons. Firstly, the framework is optimised for prediction as opposed to inference and therefore is not set up to best reflect the biological processes at play but rather to make good predictions. Secondly, there is likely to be some bias in the way contact data is collected by age which may impact these estimates[19]. Importantly, contacts of children are reported by their parents or guardians. In addition, children’s contacts are disproportionately reported as groups—markedly different from adult contacts, which were reported by the participant themselves and were mostly reported as individual contacts.

Finally, we also make no differentiation between contacts by location, duration or intimacy. In reality contacts made in different contexts (e.g. home and school) are likely to carry different potential of transmission, which may also affect the way our susceptibility and infectiousness estimates can be interpreted. One potential extension would be to include contacts by setting (Work, School, Home and Other), which would allow contacts in different contexts to be weighted differently.

There may also be other factors associated with inferred changes in susceptibility and infectiousness which do not correspond to inherent transmissibility. For example, the degree of mitigating behaviours unrelated to contact rate (e.g., masks, preferring outdoor meetings, physical distancing) may have changed differently over the epidemic for each age group. A reduction in relative susceptibility in older adults may indicate that these age groups were able to reduce the risk of infection even when making contact with others further into the pandemic than younger age groups. Also, we assumed immunity is determined by seropositivity as reported in the publicly available CIS data[27], from which we only used a single antibody level threshold for positivity. It may be the case that there is substantial variation in the antibody level distribution in sero-positive individuals of different age groups based on the distribution of vaccine history and infection acquired antibodies, which may affect age-stratified susceptibility to infection. Finally, there may be variation in the age-profile of susceptibility by variant, however due to the limitations discussed, we are unable to quantify this here.

Whereas the relative performance of the models was fairly consistent for infections, the performance when applied to case data was generally more erratic with the ranking for models and baselines changing between horizons within the same aggregation of forecast dates and age groups. This may reflect the more variable nature of case reporting, which is affected by multiple external factors affecting the week-by-week variation in cases beyond transmission dynamics. Notably, case reports are subject to variation in reporting rate, which may also differ between age-groups. This is exacerbated by changes in the UK Government’s testing policy over the course of the pandemic. This was not the case with the infection time-series, which was estimated from weekly prevalence estimates. Moreover, the infection forecasts incorporated estimates of antibody prevalence modelled from weekly serosurveys[27] and vaccination data[29], whereas the case forecasts did not.

Our work provides an indication of the potential benefits of including contact data in epidemiological forecasts, but for transparency in our analysis we have chosen not to use state of the art methods of surveillance, instead there are a number of simplifications we made when selecting and processing the epidemiological data to provide clearer analysis of this effect. These simplifications would be expected to affect the performance of forecasts when implemented in real time. Firstly, in our analysis we forecast infections using a modelled time-series fit to weekly prevalence estimates[26]. In truth, under the current data sharing protocol of the ONS Covid-19 infection survey, this data would not be publicly available on the forecast date and hence is not, in this form, applicable as a real-time application without fully integrating into the ONS infection survey workflow. We chose to do this to provide the most idealised scenario to test the application of contact data to short term forecasts, without the complexities associated with case data. Furthermore, although these estimates agree well with other estimates and case time-series, the methodology promotes a smooth infection history leading to autocorrelation in the time-series. This may unduly benefit models with high autocorrelation properties, e.g., the fixed value baseline. However, the similar relative performance of this model when evaluating case data supports our observations that this model performs best at longer time horizons. Secondly, an important feature of real-time epidemiological data is that there are several complex delay distributions which may affect the recent time series of cases[46]. This is especially true when using data by date of specimen as we do here, where full information of cases at specimen date are not available until all tests from that date have been processed. For this reason, case counts are increasingly truncated in the days leading up to the forecast date. Approaches to account for this exist[28], but here we have used the case data as known now as opposed to as known on each forecast date, as such we did not need to make this adjustment—as we would if we were making the forecasts in real-time. Extending existing approaches for real-time modelling that can deal with truncated data to include interactions between multiple time series will be an important area of future research[47,48].

The models we present used a normal likelihood, unconventional for epidemiological forecasts which tend to operate on count data. In our case, we use estimates of infection incidence, our input data is therefore not an integer time series, but a distribution at each time-point. To keep the estimates consistent between the case and infection time series’ we maintained this approach for forecasting cases as well. Lastly, the absolute measure provided by CRPS means that the overall score is weighted towards age-groups and time periods where the absolute incidence was high, this may negatively impact the overall score of models which did poorly in the “young adults” age range (16-35) where incidence was highest for much of the study period.

Overall, allowing interaction between age-groups and integrating regularly collected contact data improved forecasts when forecasting infections based on estimates from national prevalence surveys. This benefit was, however, not clear when applied to regularly collected case data, which is generally much more readily available for real-time applications. The picture this offers of the usefulness of contacts in forecasts is nuanced. Even for the idealised example of incident infections estimated retrospectively from repeated cross-sectional prevalence surveys, there are periods of improved performance, and times where the contact-based models failed to capture the dynamics of the epidemic sufficiently to improve on the other models’ predictions. The period where the contact data performed the best was during a period where contacts remained relatively consistent. This raises the question as to whether real-time contact data is capable of capturing relevant changes in transmission related behaviour when implementation of non-pharmaceutical interventions are regularly changed. As applications using contact data in real-time develop, it is important to evaluate whether the periods where contact data are informative are aligned with periods when they are also useful for infection control, and consider how future studies might be optimised to ensure this target can be achieved.

## Supporting information

Supplementary Material [[supplements/282935_file03.docx]](pending:yes)

## Data Availability

Case data is available on thu UK Covid-19 Dashboard https://coronavirus.data.gov.uk Infection and antibody prevalence data is available from the Covid-19 infection survey website https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/25november2022 and contact matrices are available from the CoMix online repository https://doi.org/10.5281/zenodo.7351951. 

## Data Availability

Case data is available on thu UK Covid-19 Dashboard https://coronavirus.data.gov.uk Infection and antibody prevalence data is available from the Covid-19 infection survey website https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/25november2022 and contact matrices are available from the CoMix online repository https://doi.org/10.5281/zenodo.7351951. 

## Author Contributions

JDM and SF Were responsible for funding acquisition, conceptualization and methodology, data curation and formal analysis of the infection estimates. JDM was responsible for project administration, carried out the Formal analysis and investigation of forecasts and prepared the original draft of the manuscript and visualisations with supervision from SF. JDM, SF, SA and SM reviewed and edited the manuscript.

## Data availability statement

Case data is available on the UK Covid-19 Dashboard [https://coronavirus.data.gov.uk](https://coronavirus.data.gov.uk)

Infection and antibody data are available on the Covid-19 infection survey website [https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/25november2022](https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/25november2022) and contact matrices are available from the CoMix online repository [https://doi.org/10.5281/zenodo.7351951](https://doi.org/10.5281/zenodo.7351951).

## Code availability statement

All code used for this work is available at [https://github.com/epiforecasts/CovidAgeGroupForecast](https://github.com/epiforecasts/CovidAgeGroupForecast). The weekly contact matrices are published on Zenodo [https://doi.org/10.5281/zenodo.7351951](https://doi.org/10.5281/zenodo.7351951)

## Financial Disclosure Statement

This work was partly funded by an Office for National Statistics COVID-19 Infection Survey Analysis grant PU-20-0205(c): JDM. This work was partly funded by the Wellcome Trust 210758/Z/18/Z: JDM and SF

## Competing interests

The authors declare that they have no competing interests

## Acknowledgements

The authors would like to acknowledge the contributions of colleagues from the COVID-19 infection Survey Analysis team at the Office for National Statistics (ONS) for their project support and thoughtful discussion during the planning and analysis phase of this research. Also, the members of the CoMix consortium for their support with the contact data, especially Chris Jarvis, Pietro Colletti, Niel Hens and John Edmunds for their feedback on the manuscript. Thirdly, Lloyd Chapman for insightful discussion during the analysis phase of the work. Finally, members of the Epiforecasts group at LSHTM for helpful comments and feedback on our modelling framework, especially Nikos Bosse for support with the *scoringutils* package.

*   Received December 2, 2022.
*   Revision received December 2, 2022.
*   Accepted December 3, 2022.


*   © 2022, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at [http://creativecommons.org/licenses/by/4.0/](http://creativecommons.org/licenses/by/4.0/)

## References

1.  1.Thacker SB, Choi K, Brachman PS. The surveillance of infectious diseases. JAMA. 1983;249: 1181–1185.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1001/jama.1983.03330330059036&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=6823080&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F03%2F2022.12.02.22282935.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1983QD55900029&link_type=ISI) 

2.  2.Fraser C. Estimating individual and household reproduction numbers in an emerging epidemic. PLoS One. 2007;2: e758.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0000758&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17712406&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F03%2F2022.12.02.22282935.atom) 

3.  3.Cori A, Ferguson NM, Fraser C, Cauchemez S. A new framework and software to estimate time-varying reproduction numbers during epidemics. Am J Epidemiol. 2013;178: 1505–1512.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/aje/kwt133&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24043437&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F03%2F2022.12.02.22282935.atom) 

4.  4.Funk S, Camacho A, Kucharski AJ, Eggo RM, Edmunds WJ. Real-time forecasting of infectious disease dynamics with a stochastic semi-mechanistic model. Epidemics. 2018;22: 56–61.
    
    
5.  5.Sherratt K, Gruson H, Grah R, Johnson H, Niehus R, Prasse B, et al. Predictive performance of multi-model ensemble forecasts of COVID-19 across European nations. medRxiv. 2022. doi:10.1101/2022.06.16.22276024
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMi4wNi4xNi4yMjI3NjAyNHYxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMTIvMDMvMjAyMi4xMi4wMi4yMjI4MjkzNS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

6.  6.Cramer EY, Ray EL, Lopez VK, Bracher J, Brennen A, Castro Rivadeneira AJ, et al. Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the United States. Proc Natl Acad Sci U S A. 2022;119: e2113561119.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1073/pnas.2113561119&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=35394862&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F03%2F2022.12.02.22282935.atom) 

7.  7.Davies NG, Klepac P, Liu Y, Prem K, Jit M, CMMID COVID-19 working group, et al. Age-dependent effects in the transmission and control of COVID-19 epidemics. Nat Med. 2020;26: 1205–1211.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41591-020-0962-9&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32546824&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F03%2F2022.12.02.22282935.atom) 

8.  8.Viner RM, Mytton OT, Bonell C, Melendez-Torres GJ, Ward J, Hudson L, et al. Susceptibility to SARS-CoV-2 Infection Among Children and Adolescents Compared With Adults: A Systematic Review and Meta-analysis. JAMA Pediatr. 2021;175: 143– 156.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F03%2F2022.12.02.22282935.atom) 

9.  9.COVID-19 Forecasting Team. Variation in the COVID-19 infection-fatality ratio by age, time, and geography during the pre-vaccine era: a systematic analysis. Lancet. 2022;399: 1469–1488.
    
    
10. 10.Levin AT, Hanage WP, Owusu-Boaitey N, Cochran KB, Walsh SP, Meyerowitz-Katz G. Assessing the age specificity of infection fatality rates for COVID-19: systematic review, meta-analysis, and public policy implications. Eur J Epidemiol. 2020;35: 1123–1138.
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F03%2F2022.12.02.22282935.atom) 

11. 11.Pijls BG, Jolani S, Atherley A, Derckx RT, Dijkstra JIR, Franssen GHL, et al. Demographic risk factors for COVID-19 infection, severity, ICU admission and death: a meta-analysis of 59 studies. BMJ Open. 2021;11: e044640.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYm1qb3BlbiI7czo1OiJyZXNpZCI7czoxMjoiMTEvMS9lMDQ0NjQwIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMTIvMDMvMjAyMi4xMi4wMi4yMjI4MjkzNS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

12. 12.Anderson RM, May RM. Age-related changes in the rate of disease transmission: implications for the design of vaccination programmes. J Hyg. 1985;94: 365–436.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1017/S002217240006160X&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=4008922&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F03%2F2022.12.02.22282935.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1985ALX2800014&link_type=ISI) 

13. 13.Wallinga J, Teunis P, Kretzschmar M. Using data on social contacts to estimate age-specific transmission parameters for respiratory-spread infectious agents. Am J Epidemiol. 2006;164: 936–944.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/aje/kwj317&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16968863&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F03%2F2022.12.02.22282935.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000241958900002&link_type=ISI) 

14. 14.Worby CJ, Chaves SS, Wallinga J, Lipsitch M, Finelli L, Goldstein E. On the relative role of different age groups in influenza epidemics. Epidemics. 2015;13: 10–16.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.epidem.2015.04.003&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=26097505&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F03%2F2022.12.02.22282935.atom) 

15. 15.Hoang T, Coletti P, Melegaro A, Wallinga J, Grijalva CG, Edmunds JW, et al. A Systematic Review of Social Contact Surveys to Inform Transmission Models of Close-contact Infections. Epidemiology. 2019;30: 723–736.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1097/EDE.0000000000001047&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31274572&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F03%2F2022.12.02.22282935.atom) 

16. 16.House T, Riley H, Pellis L, Pouwels KB, Bacon S, Eidukas A, et al. Inferring Risks of Coronavirus Transmission from Community Household Data. arXiv e-prints. 2021; arXiv:2104.04605.
    
    
17. 17.Edmunds WJ, O’Callaghan CJ, Nokes DJ. Who mixes with whom? A method to determine the contact patterns of adults that may lead to the spread of airborne infections. Proc Biol Sci. 1997;264: 949–957.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1098/rspb.1997.0131&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=9263464&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F03%2F2022.12.02.22282935.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1997XP13100002&link_type=ISI) 

18. 18.Mossong J, Hens N, Jit M, Beutels P, Auranen K, Mikolajczyk R, et al. Social contacts and mixing patterns relevant to the spread of infectious diseases. PLoS Med. 2008;5: e74.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pmed.0050074&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=18366252&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F03%2F2022.12.02.22282935.atom) 

19. 19.Gimma A, Munday JD, Wong KLM, Coletti P, van Zandvoort K, Prem K, et al. CoMix: Changes in social contacts as measured by the contact survey during the COVID-19 pandemic in England between March 2020 and March 2021. bioRxiv. medRxiv; 2021. doi:10.1101/2021.05.28.21257973
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMS4wNS4yOC4yMTI1Nzk3M3YxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMTIvMDMvMjAyMi4xMi4wMi4yMjI4MjkzNS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

20. 20.Munday JD, Jarvis CI, Gimma A, Wong KLM, van Zandvoort K, CMMID COVID-19 Working Group, et al. Estimating the impact of reopening schools on the reproduction number of SARS-CoV-2 in England, using weekly contact survey data. BMC Med. 2021;19: 233.
    
    
21. 21.Jarvis CI, Gimma A, van Zandvoort K, Wong KLM, CMMID COVID-19 working group, Edmunds WJ. The impact of local and national restrictions in response to COVID-19 on social contacts in England: a longitudinal natural experiment. BMC Med. 2021;19: 52.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12916-021-01924-7&link_type=DOI) 

22. 22.Verelst F, Hermans L, Vercruysse S, Gimma A, Coletti P, Backer JA, et al. SOCRATES-CoMix: a platform for timely and open-source contact mixing data during and in between COVID-19 surges and interventions in over 20 European countries. BMC Med. 2021;19. doi:10.1186/s12916-021-02133-y
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12916-021-02133-y&link_type=DOI) 

23. 23.Funk S, Camacho A, Kucharski AJ, Lowe R, Eggo RM, Edmunds WJ. Assessing the performance of real-time epidemic forecasts: A case study of Ebola in the Western Area region of Sierra Leone, 2014-15. PLoS Comput Biol. 2019;15: e1006785.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pcbi.1006785&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=30742608&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F03%2F2022.12.02.22282935.atom) 

24. 24.Gneiting T, Raftery AE. Strictly Proper Scoring Rules, Prediction, and Estimation. J Am Stat Assoc. 2007;102: 359–378.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1198/016214506000001437&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000244361000032&link_type=ISI) 

25. 25.Held L, Meyer S, Bracher J. Probabilistic forecasting in infectious disease epidemiology: the 13th Armitage lecture. Stat Med. 2017;36: 3443–3460.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/sim.7363&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=28656694&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F03%2F2022.12.02.22282935.atom) 

26. 26.Abbott S, Funk S. Estimating epidemiological quantities from repeated cross-sectional prevalence measurements. bioRxiv. 2022. doi:10.1101/2022.03.29.22273101
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1101/2022.03.29.22273101&link_type=DOI) 

27. 27.Coronavirus (COVID-19) Infection Survey, UK Statistical bulletins. [cited 29 Mar 2022]. Available: [https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/previousReleases](https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/previousReleases)
    
    
28. 28.Bastos LS, Economou T, Gomes MFC, Villela DAM, Coelho FC, Cruz OG, et al. A modelling approach for correcting reporting delays in disease surveillance data. Stat Med. 2019;38: 4363–4377.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/sim.8303&link_type=DOI) 

29. 29.HM Government UK. Official Coronavirus (COVID-19) disease situation dashboard. [cited 25 May 2022]. Available: [https://coronavirus.data.gov.uk/](https://coronavirus.data.gov.uk/)
    
    
30. 30.Uk G. Number of coronavirus (COVID-19) cases and risk in the UK. online: [https://www.govuk/guidance/coronavirus-covid-19-information-for-the-public](https://www.govuk/guidance/coronavirus-covid-19-information-for-the-public). 2020. Available: [https://www.bleadon.org.uk/media/other/24400/NumberofcoronavirusCOVID-19casesandriskintheUK-GOV.UK.pdf](https://www.bleadon.org.uk/media/other/24400/NumberofcoronavirusCOVID-19casesandriskintheUK-GOV.UK.pdf)
    
    
31. 31.inc2prev: Estimate incidence from ONS prevalence estimates. Github; Available: [https://github.com/epiforecasts/inc2prev](https://github.com/epiforecasts/inc2prev)
    
    
32. 32.The CoMix study. In: uHasselt [Internet]. [cited 29 Mar 2022]. Available: [https://www.uhasselt.be/en/aparte-sites-partner-en/epipose/the-comix-study](https://www.uhasselt.be/en/aparte-sites-partner-en/epipose/the-comix-study)
    
    
33. 33.Stan Team. Stan Modeling Language User’s Guide and Reference Manual. 2012.
    
    
34. 34.Madewell ZJ, Yang Y, Longini IM Jr., Halloran ME, Dean NE. Household Secondary Attack Rates of SARS-CoV-2 by Variant and Vaccination Status: An Updated Systematic Review and Meta-analysis. JAMA Netw Open. 2022;5: e229317.
    
    
35. 35.Ganyani T, Kremer C, Chen D, Torneri A, Faes C, Wallinga J, et al. Estimating the generation interval for coronavirus disease (COVID-19) based on symptom onset data, March 2020. Euro Surveill. 2020;25. doi:10.2807/1560-7917.ES.2020.25.17.2000257
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2807/1560-7917.ES.2020.25.17.2000257&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=7201952&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F03%2F2022.12.02.22282935.atom) 

36. 36.Hart WS, Miller E, Andrews NJ, Waight P, Maini PK, Funk S, et al. Generation time of the alpha and delta SARS-CoV-2 variants: an epidemiological analysis. Lancet Infect Dis. 2022;22: 603–610.
    
    
37. 37.Alene M, Yismaw L, Assemie MA, Ketema DB, Gietaneh W, Birhan TY. Serial interval and incubation period of COVID-19: a systematic review and meta-analysis. BMC Infect Dis. 2021;21: 257.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/s12879-021-05950-x&link_type=DOI) 

38. 38.Abbott S, Sherratt K, Gerstung M, Funk S. Estimation of the test to test distribution as a proxy for generation interval distribution for the Omicron variant in England. bioRxiv. 2022. doi:10.1101/2022.01.08.22268920
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1101/2022.01.08.22268920&link_type=DOI) 

39. 39.McAloon C, Collins Á, Hunt K, Barber A, Byrne AW, Butler F, et al. Incubation period of COVID-19: a rapid systematic review and meta-analysis of observational research. BMJ Open. 2020;10: e039652.
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYm1qb3BlbiI7czo1OiJyZXNpZCI7czoxMjoiMTAvOC9lMDM5NjUyIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMTIvMDMvMjAyMi4xMi4wMi4yMjI4MjkzNS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

40. 40.Report 49 - Growth, population distribution and immune escape of Omicron in England. In: Imperial College London [Internet]. [cited 29 Mar 2022]. Available: [https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/covid-19/report-49-Omicron/](https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/covid-19/report-49-Omicron/)
    
    
41. 41.Bosse NI, Abbott S, EpiForecasts FS. scoringutils: Utilities for Scoring and Assessing Predictions. 2020.
    
    
42. 42.Institute for Government. Timeline of UK government coronavirus lockdowns and restrictions. [cited 28 Jul 2022]. Available: [https://www.instituteforgovernment.org.uk/charts/uk-government-coronavirus-lockdowns](https://www.instituteforgovernment.org.uk/charts/uk-government-coronavirus-lockdowns)
    
    
43. 43.Monod M, Blenkinsop A, Xi X, Hebert D, Bershan S, Tietze S, et al. Age groups that sustain resurging COVID-19 epidemics in the United States. Science. 2021;371. doi:10.1126/science.abe8372
    
    [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjE3OiIzNzEvNjUzNi9lYWJlODM3MiI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzEyLzAzLzIwMjIuMTIuMDIuMjIyODI5MzUuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

44. 44.COVID-19 vaccination programme. In: GOV.UK [Internet]. 27 Nov 2020 [cited 28 Jul 2022]. Available: [https://www.gov.uk/government/collections/covid-19-vaccination-programme](https://www.gov.uk/government/collections/covid-19-vaccination-programme)
    
    
45. 45.Franco N, Coletti P, Willem L, Angeli L, Lajot A, Abrams S, et al. Inferring age-specific differences in susceptibility to and infectiousness upon SARS-CoV-2 infection based on Belgian social contact data. PLoS Comput Biol. 2022;18: e1009965.
    
    
46. 46.Gostic KM, McGough L, Baskerville EB, Abbott S, Joshi K, Tedijanto C, et al. Practical considerations for measuring the effective reproductive number, Rt. PLoS Comput Biol. 2020;16: e1008409.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pcbi.1008409&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=33301457&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F03%2F2022.12.02.22282935.atom) 

47. 47.Abbott S, Lison A, Funk S. epinowcast: Flexible hierarchical nowcasting. Zenodo; 2022. doi:10.5281/ZENODO.5637165
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.5281/ZENODO.5637165&link_type=DOI) 

48. 48.Abbott S, Hellewell J, Sherratt K, Gostic K, Hickson J, Badr HS, et al. EpiNow2: Estimate Real-Time Case Counts and Time-Varying Epidemiological Parameters. 2021. doi:10.5281/ZENODO.3957489
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.5281/ZENODO.3957489&link_type=DOI)

 [1]: /embed/graphic-2.gif
 [2]: /embed/graphic-3.gif
 [3]: /embed/graphic-4.gif
 [4]: /embed/graphic-5.gif
 [5]: /embed/graphic-6.gif
 [6]: /embed/graphic-8.gif
 [7]: /embed/inline-graphic-1.gif
 [8]: /embed/graphic-9.gif
 [9]: /embed/graphic-10.gif
 [10]: /embed/graphic-11.gif
 [11]: /embed/graphic-12.gif
 [12]: /embed/graphic-13.gif
 [13]: /embed/graphic-14.gif
 [14]: /embed/inline-graphic-2.gif