Estimating Between Country Migration in Pneumococcal Populations

Sophie Belman; Henri Pesonen; Nicholas J. Croucher; Stephen D. Bentley; Jukka Corander

doi:10.1101/2023.11.15.23298520

Abstract

Streptococcus pneumoniae (the pneumococcus) is a globally distributed, human obligate opportunistic bacterial pathogen which, although often carried commensally, is also a significant cause of invasive disease. Apart from multi-drug resistant and virulent clones, the rate and direction of pneumococcal dissemination between different countries remains largely unknown. The ability for the pneumococcus to take a foothold in a country depends on existing population configuration, the extent of vaccine implementation, as well as human mobility since it is a human obligate bacterium. To shed light on its international movement, we used extensive genome data from the Global Pneumococcal Sequencing (GPS) project and estimated migration parameters between multiple countries in Africa. Data on allele frequencies of polymorphisms at housekeeping-like loci for multiple different lineages circulating in the populations of South Africa, Malawi, Kenya, and The Gambia were used to calculate the fixation index (F_st) between countries. We then further used these summaries to fit migration coalescent models with the likelihood-free inference algorithms available in the ELFI software package. Synthetic data were additionally used to validate the inference approach. Our results demonstrate country-pair specific migration patterns and heterogeneity in the extent of migration between different lineages. Our approach demonstrates that coalescent models can be effectively used for inferring migration rates for bacterial species and lineages provided sufficiently granular population genomics surveillance data. Further it can demonstrate the connectivity of respiratory disease agents between countries to inform intervention policy in the longer term.

1 Introduction

Inferring migration events in natural bacterial populations between geographically separated regions is generally challenging since bacteria cannot be tagged in the same way as, for example, animals. For bacteria which colonize humans the rate of migration will be a complicated function of the host mobility and ecological factors influencing the success of onward transmission, such as the level of hygiene, use of antibiotics and vaccination campaigns in the host populations. Progress identifying large-scale migration patterns among bacteria has been primarily made for species or strains more likely causing acute infections and serious illnesses, such as cholera Domman et al. [2017], Okoro et al. [2012], Comas et al. [2013], Lassalle et al. [2023]. However, genetic epidemiology has also been used to elucidate the spread of multi-drug resistant (MDR) strains of Streptococcus pneumoniae (the pneumococcus) across different global regions. For such commonly asymptomatic bacteria these studies are highly reliant on comprehensive sampling van Tonder et al. [2015], Quintero Moreno and Araque [2018], Croucher et al. [2014]. Overarchingly, variable sampling strategies between countries, poor approximation of between country mobility, and the large time scales of between country pathogen spread can hinder definitive estimates of the weight and direction of between-country spread.

Global genomic sampling as part of the Global Pneumococcal Sequencing (GPS) Project demonstrates that some lineages and serotypes of this bacterium circulate locally, while others are spread globally. With the exception of the aforementioned MDR strains, the time scales of this spread and the frequency, or direction of migration between specific countries leading to the extant lineage distributions remains unclear Gladstone et al. [2019]. While we can use time resolved phylogenies to infer that genome pairs from countries in South Africa and countries elsewhere in Africa become increasingly similar with increasing divergence times, this is still, overall, much less likely than pair similarity within country Belman et al. [2023]. Informing coalescent models with true case count data can reduce the impact of geographic sampling bias, but for an endemic, often asymptomatic pathogen this remains difficult Layan et al. [2023]. Our previous work used human mobility data from Meta Maas [2019] to build a model inferring bacterial movement Belman et al. [2023], but could provide limited information about the direction of spread. Further, reliable between country human mobility data is scarce for continents such as Africa, so similar approaches have limited use in this context Deutschmann et al. [2022], Gabrielli et al. [2019].

In this work we quantify between-country pneumococcal migration among four African countries. Due to the lack of direct observations of between-country migration we are unable to use typical Bayesian techniques. However, population genomics based surveillance data from multiple countries provides an opportunity to consider quantification of migration rates via coalescent models and likelihood-free inference (LFI) Aeschbacher et al. [2013], Wegmann et al. [2009]. In particular we use Approximate Bayesian Computation (ABC), which is a type of LFI in which we compare population summary statistics between simulated and real genomic data to determine (in this case) patterns of pneumococcal migration.

We can explore the impact of different evolutionary parameters on population samples using software packages which simulate coalescing populations Kelleher and Lohse [2020], Kern and Schrider [2016], Ewing and Her-misson [2010]. ms prime is an adaptation to the classical neutral ms simulator which includes demographic parameters such as population size and migration Hudson [2002]. Such a simulation-based inference framework Sisson et al. [2018] enables inference of the unknown migration parameters without access to a closed form expression for the data under any particular set of parameters assumed to govern the migration process. Given the generality of the problem of migration quantification for microbes, our approach is of wider interest beyond the pneumococcal case considered in detail here.

2 Materials and methods

2.1 Population Divergence Summary Statistics

Rather than comparing the entire nucleotide diversity of genes and genomes within and between populations one can summarize the diversity using relevant statistics derived from population genetics theory under neutrality.

The fixation index (F_st) was first developed by Sewall Wright Wright [1949] in 1949 to describe randomly drawn alleles in one population as compared to the total sampled population. There have been many adaptations to this but the Weir & Cockerham (WC) F_st Weir and Cockerham [1984] is the most commonly used today and describes the total population as the most recent ancestral population. The WC F_st is a parameter describing the evolutionary process of drift rather than a statistic of observed samples and assumes equal drift across populations. When the F_st for each population is different the WC estimator becomes a function of the ratio of sample sizes between them rather than true divergence. Further WC F_st suffers from the ‘star phylogeny’ assumption in that all populations independently descended from the same ancestor. Weir & Hill adapted the WC F_st to estimate population specific values and Hudson et al. adapted this to estimate the F_st between populations in 1992 Hudson et al. [1992], Selander and Hudson [1976], Holsinger and Weir [2009]. Hudsons F_st is more robust to variable sample sizes and variable F_st values across populations Bhatia et al. [2013]. Simply Hudsons F_st can be described by Weir & Hills single population F_st estimate. where is the allele frequency in population i at SNP s, and is the frequency of the same allele at that SNP s in the ancestral population and is the population-specific F_st for population i. Hudsons estimate combines these to estimate the F_st for a pair of populations to be where is population 1 and is population 2.

2.2 Inference Strategy Overview

2.2.1 Simulator

We used msprime Kelleher and Lohse [2020], Nelson et al. [2020], Baumdicker et al. [2022] as the simulator in this framework. msprime is a software package which simulates the coalescent process for thousands of genomes. It can incorporate recombination, mutation, and migration between demes. It outputs phylogenetic trees representing the population, onto which mutations can be imputed and summary statistics calculated. These summary statistics are then compared to summary statistics from the real data. The coalescent simulation requires specific parameters. For the two deme model we included geographic demes A and B and initial population size for each deme (P_A, P_B). The sample sizes drawn from each population (samp_A, samp_B) are included as parameters and crucially, we include the inferrable parameters for the migration rates asymmetrically from deme A to deme B (mig_A,B), and the inverse (mig_B,A). We also include a mutation rate (θ). We specified these parameters scaling the mutation rate to the length of genome we input and down-sampled the true population sizes from each country for computational efficiency. We included the infinite sites model in this framework which rather than allowing only a finite number of mutations per-site allows an infinite number across continuous space whereby no site mutates twice Kimura [1969], Ma et al. [2008] (Table 1). We employ replicates, both for the simulator and the real-data, to reduce variability around the summary statistic estimate (Table 1).

View this table:

Table 1: Parameter summary for inference validation.

Estimates of true parameter values for the deme population size, mutation rate in sites/per-year, whether there is a discrete genome, and sequence length for left) truth and right) within inference validation msprime simulation frame-work.

2.2.2 Parameter Inference Algorithm

LFI methods such as ABC were originally developed to do statistical inference with a very large number of synthetic observations from the simulator, rejecting non-conforming observations and keeping only observations close the observed data Sisson et al. [2018]. However, with a complex process involving multiple unknown parameters millions of simulations are often necessary to infer the required parameter using the basic rejection algorithm. A state-of-the-art ABC method such as the Bayesian Optimization for Likelihood-Inference (BOLFI) algorithm employs active learning to reduce the required computer simulation multiple fold by focusing only on the relevant parts of the parameter space. This can be done using a probabilistic model such as a Gaussian Process to model the relationship of parameters and the discrepancy between the observed data and synthetic data, and seeking to minimize this Gutmann et al. [2016]. BOLFI provides us with a surrogate likelihood function that we use in a Bayesian framework along with the parameter prior distribution to obtain the posterior distribution. In practice, a sample is drawn from the posterior using e.g. MCMC sampling to calculate posterior means and other summarizing estimates.

2.2.3 Model Validation: Recapturing Simulated Data

Two Deme Model

To validate this inference framework we first conducted prior predictive analysis to determine the sensitivity of the summary statistic, F_st, to a range of migration parameters. We maintained initial populations sizes for A and B of 6000 and 2000 with 600 sampled from each. Other parameters include a sequence length of 500, θ set to 2e⁻⁵, and 500 replicates. We divided the migration parameter by 5000 for scaling (Figure 1A). We also evaluated the sensitivity of the F_st to other parameters in the simulator framework including the initial and sampled population sizes, and the mutation rate (Figure S1A-C). We input specific asymmetric migration parameters (mig_A,B and mig_B,A) between two demes (A and B) and attempted to recapture them using simulated data. For our validation we again set the mutation rate to 2.5 · 10⁻⁵, set the number of replicates to 500, sequence length of 500, samp_A=600, samp_B=600, P_A=6000, P_B=2000, corresponding to the population sizes of each country as estimated from LandScan Rose et al. [2020](Table 1). We use the infinite sites model by setting discrete genome to FALSE thus including infinite possible SNP sites. We calculate the F_st for each simulation and use 9 quantiles at 0.1 increments across replicates for the summary statistic. We use uniform priors mig_A,B ∼ Unif[L, U] and mig_B,A ∼ Unif[L, U] to calculate the posterior distributions where L represents the lower bounds and U represents the upper bounds. We input 0.6 and 0.1 for mig_A,B and mig_B,A respectively(Figure 1B-C; red line, Figure S2) The estimated parameters for mig_A,B and mig_B,A respectively were 0.207 [95% Confidence Intervals (CIs) 0.011-0.422] and 0.477 (95% CIs 0.164-0.721) (Figure 1B-C; blue dashed line, Figure S2).

Figure 1: Recapturing input migration parameters with a 2 Deme model.

A) Prior predictive analysis testing the sensitivity of our summary statistic, the fixation index, to our parameter estimates for each parameter in an asymmetric model. Initial populations sizes for A and B of 6000 and 2000 with 600 sampled from each. Other parameters include a sequence length of 500, θ set to 2e-5, and 500 replicates. The migration parameter is divided by 5000 for scaling. B) The overlapping posterior density migration parameter estimates for migration parameter ‘mig_ab’ — from a population A (initial population size 6000) to population B (initial population size 2000) and ‘mig_ba’ from population B to population A. The ‘true’ input parameters were mig_ab=0.1 and mig_ba=0.6. C) The posterior densities visualized individually for each parameter (grey), the true input parameter is indicated by the red vertical line while the median posterior estimate is indicated by the blue dashed line.

We repeated this validation varying the initial population size of each deme relative the true population size of each country(Table 1, Figure S2). There is co-linearity between the asymmetric parameters. The model is able to resolve this and recapture the correct peak for each parameter.

Four Deme Model

We validate a symmetric four deme model in which the weight, not the direction of migration, is estimated between all four deme pairs within one model. We conducted prior predictive analysis to determine appropriate bounds for the uniform prior (Figure S3) and the impact of each parameter on all other parameters. We found that the parameters were not independent of each other and altering one migration parameter impacted the F_st of another (Figure S3) implying that migration between two countries may impact the migration estimates in a third or fourth additional country. This relationship is expected due to the migration across all demes included in the simulation which is inherent to our framework.

To validate the four deme model we input 6 migration parameters (mig_ab = 2.5, mig_ac = 2.5, mig_ad = 2.5, mig_bc = 1.5, mig_bd = 1.5, mig_cd = 0.5) and were able to recapture them within the symmetric model. The true migration parameter is indicated by the red vertical line while the estimated median parameter is indicated by the blue-dashed vertical line. The input population sizes for each of the demes scale to the true population size and are indicated in Figure S4. All other parameters are consistent with those described in Section ‘Two Deme Model’ however only a single migration parameter is input for each deme pair.

2.3 Application to a Pneumococcal Dataset

2.3.1 Isolate Culture and Sequencing

We included pneumococcal genomes from four Sub-Saharan African countries including South Africa (N=6919), The Gambia (N=3090), Malawi (N=1612), and Kenya (N=961) from the GPS Project in our initial dataset GPS et al. [2022](Figure 2A, Table 5). The isolates were collected between 1990 and 2014 (Figure 2B), comprised 360 GPSCs and 83 different serotypes, and were randomly selected for sequencing (Figure 2C). ‘Country’ will be used interchangeably with ‘Deme’ throughout this manuscript. We will also interchangeably use ‘GPSC’ and ‘lineage’.

Figure 2: Population composition of four demes.

A) Map of isolation location of pneumococcal genomes including South Africa (pink), Kenya (green), The Gambia (purple), and Malawi (yellow). B) Bottom) The isolates were collected from 1990-2014 and the total dataset included N=12582 isolates Top) subsetting by the ‘Dominant GPSCs’ those spanning all four demes and comprising >1% of the GPSCs in each country included six total GPSCs highlighted in C) which shows the proportion of total GPSCs each GPSC comprised in each country. The dominant six are outlined in a grey box. D) The relative risk of similarity within country as compared to between countries at the lineage level for each country and the overall risk of being the same GPSC from the same country compared to different countries.

We calculated the relative risk of GPSC similarity by country as per the method described in Belman et al. [2023] Belman et al. [2023]. To determine whether there are distinct GPSCs circulating in each country we calculated the risk that a pair of isolates sampled from the same country would be the same GPSC as compared to a pair of isolates selected from each country and another country (Figure 2D). There was a 2.02 (95%CIs 1.23-2.23) fold increased risk of a pair being the same GPSC when from the same country as compared to pairs from different countries indicating that there were distinct GPSCs circulating in each.

These pneumococcal isolates were selectively cultured on BD Trypticase Soy Agar II with 5% sheep blood (Beckton Dickinson, Heidelberg, Germany) and incubated overnight at 37°C in 5% CO₂. Genomic DNA was then extracted manually using a modified QIAamp DNA Mini Kit (QIAGEN, Inc., Valencia, CA) protocol. As part of GPS, pneumococcal isolates were whole-genome sequenced on the Illumina HiSeq platform to produce paired-end reads with an average of 100-125 bases in length and data were deposited in the European Nucleotide Database. Whole genome sequence data was processed as previously described Gladstone et al. [2019].

To control for population structure, and be sure the parameters we were estimating were not just due to lineage diversity in each country, we included only GPSCs which are at greater than 2% prevalence overall in the population (11 GPSCs), we then limited to only those with isolates present in each of the four countries (7 GPSCs), and finally we restricted based on those GPSCs which comprised >1% of the remaining number of isolates in each country. Ultimately we included 6 ‘Dominant GPSCs’ (GPSC2, GPSC5, GPSC8, GPSC10, GPSC22, and GPSC26) which included a total of 2746 genomes.

2.3.2 Neutral Gene Selection

We selected genes which are less impacted by evolutionary selection processes due to their less than 0.1 median IgG binding affinity, and ‘non-antibody binding’ status in Croucher et al. [2017] Croucher et al. [2017]. The selected genes are present with at least a 99% frequency across our dataset. Across the whole dataset we included two groups of ‘neutral’ genes: 1) a subset of 341 genes which were non-ABT, and 2) 84 genes which had <0.1 median IgG binding affinity and were non-ABT. For the by-GPSC analysis we selected 81 and 355 ‘neutral’ genes which were core across all six Dominant GPSCs and fit the same criteria as above.

2.3.3 Pairwise Distances Between Genes

We built genome alignments for the previously described sets of genes utilizing a combination of Panaroo Tonkin-Hill et al. [2020] and BioPython Chapman and Chang [2000]. We calculated pairwise distances from every genomes to every other genomes using both Hamming (Figure S5A) Hamming [1950] and Jaccard distances (Figure S5B) Murphy [1996], Jaccard 1912 grouping the genomes by isolation country. If there were distinct, qualitative differences between countries one would expect clear divergent blocks of similarity (lighter color) along the diagonal of the pairwise distance plots. We used scikit-allel Miles et al. [2021] for all pairwise distance calculations.

At the population level (N=12582) the homogenous color across both Jaccard, and Hamming distance plots is representative of mixed populations across the countries. Despite the 2-times higher probability of a pair being the same GPSC when from the same country as compared to pairs from different countries there are still many similar GPSCs between them (Figure S5).

2.3.4 Controlling Population Structure

GPSC Level

To control for population structure we interrogated each Dominant GPSC (N=2746), for the pairwise distances between genes across countries. We only included genes which were present in all four countries and across the Dominant GPSCs (N=81: non-ABT & <0.1 IgG Binding; N=355: non-ABT). The 81 selected genes had a median gene length of 555 bp (95% CIs 207-1809)(Figure S6).

We grouped the alignments by country to see if there were qualitatively distinguishable differences within versus between country. Again we calculated Hamming distances for 81 and 355 (Figure S7) genes. We repeated this using Jaccard distances for both gene sets (Figure S8). Jaccard and Hamming distances resulted in similar patterns.

Linked SNP Sites

Co-selected sites may exacerbate the signal within countries due to recombination, and thus mask the migration signal between countries. To exclude co-selected sites we calculated the mutual information score (MI) across the concatenated neutral genes both overall and for each GPSC. We visualized these using SpydrPick Pensar et al. [2019] to understand the relationship across all 81 genes bi-allelic SNP sites (Table 2, Figure S9). The majority of sites with high, direct MI scores were within 1kb of each other. GPSC2 and GPSC8 have high linkage across many nucleotide distances which span selected genes. These are invasive lineages and undergo less recombination than other lineages. This is demonstrated by each only comprising a single PCV13-type serotype, serotypes 1 and 5 respectively. You can explore each lineage phylogeny at the Microreact web server (Table 4).

View this table:

Table 2: Biallelic SNP count for each GPSC. Including total number of biallelic SNPs, total excluding all within a 1kb window upstream with an r² > 0.5, and total excluding all within a 1kb window upstream with an r² > 0.05.

Given the distribution of gene length when the distance between SNP sites exceeds approximately 1.8kb the sites are separated by more than one gene length (Figure S6) resulting in ultimately fewer genes being included. We controlled for co-selection between SNPs with a strict correlation threshold minimum of 0.05 r² and a more flexible minimum correlation of 0.5 r². We used bcftools +prune to remove all SNPs with an r² greater than the threshold within a 1kb upstream window and repeated our two deme migration analysis for each GPSC between South Africa and Malawi (Table 2). A 1kb window encompasses the entire length of the majority of genes included (Figure S6). We proceeded with the 0.5 relatedness threshold as it excluded fewer SNP sites but maintained similar estimates as the stricter threshold(Figure S10).

2.3.5 Fixation Index

To quantify the divergence between each location we then calculated F_st overall and by GPSC using tskit Baumdicker et al. [2022], Kelleher and Lohse 2020, Nelson et al. [2020]. We compared the SNP F_st between Weir-Cockerham and Hudsons F_st using tskit within the coalescent simulation software msprime Kelleher et al. [2016], Nelson et al. [2020], Baumdicker et al. [2022]. The Hudson’s F_st is calculated in tskit: where X and Y and d(X) and d(Y) are the populations and diversity of those populations respectively and d(X, Y) is the shared diversity of both populations. A higher F_st corresponds to a more divergent, separate population, while a lower F_st corresponds to a more highly mixing population, also known as panmictic.

We compare the F_st estimates from Weir & Cockerham and Hudson and find them to be largely linear with some over-estimates by WC. Due to the uneven sample sizes of our populations we proceed with the Hudson’s estimate (Figure S11).

We calculate the F_st across all genomes between each of the four demes for each GPSC. We repeat this including 81 neutral genes, 355 neutral genes, and the pbp genes as a control for selection (Figure S12). We find variable estimates both across GPSCs and between countries. Notably GPSC2 and GPSC8 have higher estimates overall than the rest of the GPSCs in both the 81 and 355 gene comparisons (Figure S12A-B). We included the pbp genes as these genes confer resistance to β-lactams and are under significant selective pressure. Given variable selective pressures across countries they would be expected to have different cross-country diversity patterns than our ‘neutral’ selected genes. We do see differences between the pbp gene divergence as compared to the divergence patterns we see when including the ‘neutral’ genes. This is reassuring as we expect a different pattern as a result of AMR selection. The F_st for the 355 genes was less informative with regards to the different populations. Hereafter we only include the 81 gene analysis by GPSC. All analyses and visualizations was conducted in Python v3.9.13 and R v3.6.1.

2.4 Overall Between-Country Migration Risk

While the above framework identifies migration parameters when migration occurs, it does not account for the actual probability of movement between countries as compared to movement within a country. To address this we applied a simple relative risk framework incorporating divergence time between genome pairs. We included those most prevalent GPSCs across the dataset GPSC2 (N=904), GPSC5 (N=473), GPSC10 (N=306). We created reference genomes for each GPSC using ABACAS to order the contiguous sequences (contigs) from a representative of each GPSC mapped to Strep-tococcus pneumoniae (strain ATCC 700669/Spain 23F-1) [EMBL accession: FM211187]. Any contigs which did not align were concatenated to the end. We multiply mapped all genomes from each GPSC against these references respectively using a custom mapping, variant calling, and local realignment around indels pipeline using bwa-MEM Li and Durbin [2009] and samtools mpileup Li et al. [2009]. We built trees masking recombination regions using Gubbins Croucher et al. [2015] with RAxML Stamatakis [2014] and a general time reversible (GTR) evolutionary model. We converted branch length to time using BactDating with a mixed gamma, relaxed clock model Didelot et al. [2018].

We compare the location (loc) and label (G) (genetic similarity) of pairs of sequences (i,j) that were collected around the same time (t). This approach has been shown to be robust to substantial biases in timing and location of isolate collection. To determine at what divergence time it became equally likely that a pair of genomes were within the same country as between different countries, for each country, we constructed pairwise matrices comparing every isolate to every other isolate (N Pairs=1683). We then determined the proportion of genomes at each divergence time across rolling 10-year time windows within a country as between countries. Dividing the proportion which are within the same country by the proportion between countries gives the relative risk.

In this case the numerator contains the ratio of pairs which are at each divergence time, collected within 10 years of each other t, from the same country loc, over the total number of pairs collected within 10 years of each other t ≤ 10years from the same country. The denominator is the ratio of pairs which are within each divergence time G, collected within 10 years of each other, from different countries (L_ref) over the total number of pairs collected within 10 years of each other from different countries. Geographic distances were calculated based on the centroid coordinates of each province (Equation 4).

To quantify uncertainty, we used a bootstrapping approach where in each bootstrap iteration we randomly sampled with replacement the isolates before recalculating the statistic. We report the 2.5 and 97.5 percentiles from the resulting distribution.

3 Results

3.1 Inference Strategy

In brief, to estimate between country migration parameters we developed a framework which uses summary statistics to characterize a sampled pathogen population and compares them to corresponding statistics from a simulated pathogen population under a given coalescent model. We use the Hudsons fixation index (F_st) as the summary statistic in this model. We employ several key pieces of software including msprime to simulate a coalescing population, and the Engine for Likelihood-free Inference (ELFI) Lintusaari et al. [2018] to compare the simulated populations with the observed population data. Broadly, our strategy falls under the Approximate Bayesian Computation (ABC) paradigm Sisson et al. [2018].

Using this framework we develop two models for quantifying migration. The two-deme model compares pairs of countries and determines the asymmetric migration parameters between them (the amount of migration from country A to country B and the reverse). The other model is a four-deme model quantifying symmetric migration parameters between four countries, encapsulated in six rate parameters.

3.2 Application to a Pneumococcal Dataset

We implement both of these models using genomes from the GPS project (Figure 2A, Table 5) GPS et al. 2022. We included GPSCs (also referred to as lineages throughout this paper) representing pneumococcal between-country variation in neutral genes and approximately un-linked SNP sites. We included 12582 genomes for initial exploration but ultimately reduced this to 6 ‘Dominant GPSCs’(GPSC2, GPSC5, GPSC8, GPSC10, GPSC22, and GPSC26) (N=2746 genomes) from South Africa, The Gambia, Malawi and Kenya for the migration models.

The isolates were collected between 1990 and 2014 (Figure 2B), comprised 360 GPSCs and 83 different serotypes, and were randomly selected for sequencing (Figure 2C). ‘Country’ will be used interchangeably with ‘Deme’ throughout this manuscript. We will also interchangeably use ‘GPSC’ and ‘lineage’ (Figure 2B-C, Table 3, Table 4).

View this table:

Table 3: The number of genomes for each country and GPSC.

Including only GPSCs which were >2% prevalence overall, present in all four countries, and present at >1% prevalence in each country.

View this table:

Table 4: Interactive phylogenetic trees for each GPSC presented in the web server Microreact GPSC Interactive phylogeny and metadata hosted on Microreact

View this table:

Table 5: Country summary for between-deme migration analysis.

Description of the four countries between which we estimated migration including number from each country, years of vaccine introduction and dosing schedule, the proportion of the dataset for each country which comprised NVTs and the percent of isolates from each country which were from children <5.

We included 81 ‘neutral’ non-antibody binding type genes selected from the pangenome wide immunological screen conducted by Croucher et al. [2017] Croucher et al. [2017]. These genes were core (minimum 99% prevalence) across each of the Dominant GPSCs. Further, we excluded linked SNP sites with a greater than 0.5 r² relatedness threshold. We explored the Jaccard and Hamming distances between genomes for both the 81 and 355 genes and found clear boundaries between the country clusters with Hamming distances ranging from 0-0.5. Some GPSCs had clear distinguishable divergence between specific countries while other country pairs were very similar by Hamming distance (Figure S7, Figure S8).

3.3 Estimating the Weight and Direction of Migration for Country Pairs

To determine the symmetry of migration across all four demes included in this model (South Africa, The Gambia, Kenya, and Malawi) we considered 6 separate models, one for each GPSC, estimating the migration parameters between deme pairs. We used a uniform prior with bounds corresponding to the sensitivity of the simulation demonstrated in the F_st sensitivity analysis (Figure 1A).

We implemented the two deme model to estimate migration parameters between each pair of demes asymmetrically given the 81 concatenated neutral genes with linked sites removed. We fit the model using BOLFI approximation and drew a posterior sample from it. We ran 3 independent implementations. We successfully estimated 10 of the 12 asymmetric migration parameters for all deme pairs by GPSC (Figure S13, Figure S14, Table S1). Parameter estimates for GPSC10 and GPSC26 between South Africa and Kenya did not settle on a single migration parameter in either migration direction due to the bimodal distribution of the posteriors and a co-linear relationship between the parameters. A migration weight which was higher from South Africa to Kenya than from Kenya to South Africa resulted in a similar model as the reverse; as such the model was unable to resolve these parameters (Figure S14). Resolving the peak which is ‘best’ is likely futile in that both peaks are equally likely to be true (Figure S14).

To summarize and compare migration between demes across GPSCs We divided the median posterior value by the maximum bound to find the probability of migration for each parameter. We then found the distribution of migration probabilities for each deme where the confidence intervals represent the distribution of values across GPSCs (Figure 3A). The difference between the estimates for each deme pair is the relative probability of migration. For example, the pneumococcal populations of South Africa and Malawi are consistently better explained by 50% more migration from South Africa to Malawi than the reverse, while the distribution between Malawi and The Gambia is explained differently across GPSCs with on average more migration from Malawi to The Gambia. The exception to this deme pair is in GPSC2 where the reverse is true (more migration from The Gambia to Malawi) (Figure 3B, Table S1).

Figure 3: Migration parameters summary estimates.

A)The scaled distribution of migration probabilities for each deme across GPSCs grouped by deme pair and colored according to country of origin and shaped according to destination. B) Migration parameters directional probability estimates between each deme pair. Colored by the probability of migration between each deme pair. where red is >50% migration probability and blue is <50% migration probability.

3.3.1 Weight

To normalize the migration parameter estimates for each GPSC We found the mean parameter estimate across all posterior samples for each GPSC (mig_gpsc). We then calculated the relative migration parameter estimate by dividing each parameter by the GPSC specific parameter . We first group these by deme to determine if there are consistent deme-wise patterns(Figure S15A) and then group them by GPSC to identify the differences across demes (Figure S15B, Table S1).

3.3.2 Direction

We estimated the probability of directional migration for each GPSC and deme pair by identifying the percentage of posterior migration parameter estimates for mig_ab which were greater than those for mig_ba. We then grouped these into high (≥ 0.6), medium (0.4 − 0.6), or low (≤ 0.4) migration probability. There were two migration patterns. Pattern one is seen in GPSC26, GPSC22, GPSC10, and GPSC8 in which there was only symmetric migration between Kenya and South Africa and asymmetric between all other deme pairs. (Figure 3B). Pattern two applied to GPSC5 and GPSC2 and included symmetric migration between Kenya and South Africa as well as between Malawi and The Gambia, where all other pairs had asymmetric migration patterns (Figure 3B). Across all GPSCs there was a higher migration probability from Kenya to Malawi and Kenya to The Gambia than from either of those to Kenya; and a higher migration probability from South Africa to Malawi and South Africa to The Gambia than from either of those to South Africa. Considering South Africa and Kenya have the highest population sizes (60.04 Million and 54.99 Million respectively in 2019) this implies that the higher the population size, its relative contribution to between pair migration is likely to be higher (Table 1). For those GPSCs with migration pattern one we estimated a higher migration probability from Malawi to The Gambia, again in line with this hypothesis (Table S1). The consistent directional patterns between GPSCs from independent models is reassuring and helps to validate our framework.

3.3.3 Demographic Contribution to Migration

Using the raw parameter estimates from all two-deme asymmetric models we interrogated whether the origin population size (α), the destination population size (β), or the distance between deme pairs at the centroid (km) (γ), had a larger contribution to the migration parameter estimates (θ). We fitted four logistic models, one for each parameter, and one model including all three parameters (Equation 5, Equation 6, Equation 7, Equation 8). In the model encompassing all three parameters a greater destination population size was significantly associated with the parameter estimate (p = 1.07e⁻⁰⁶), while the origin population size (p = 0.606), and distance (p = 0.877) were not associated (AIC=150.13)(Equation 5).

In both the overall model (Equation 5) and the individual model (Equation 7) a greater destination population size was again significantly associated with a smaller migration parameter (Overall: Coefficient −0.021, p = 1.07e⁻⁰⁶, AIC=150.13; Individual: Coefficient=-0.021,p = 1.02e⁻⁰⁸, AIC=146.66). The distance between countries was marginally associated (Coefficient 1.014e-04, p = 0.0354; AIC=176.05), but surprisingly in that a larger distance resulted in a higher migration parameter. This is largely driven by GPSC26 where the highest migration parameters are associated with The Gambia which is the most distant country from Kenya, Malawi, and South Africa. Due to the low sample size and limited distances explored the association between migration and distance is not robust or generalisable. In none of the models was the population size of the origin significantly associated with the migration parameter estimate (p = 0.232; AIC=179.16)(Figure S18). To be clear this applies specifically to these four countries and is an oversimplification of reality due to the discrete number of distances between the four countries explored in these migration models. However, in summary, the raw migration parameter estimates are negatively correlated with an increasing destination population size. This is sensible when placed alongside directional probability estimates whereby the higher population size is more likely to be the source of migration. Taken together for these GPSCs migrating between South Africa, Malawi, The Gambia, and Kenya this implies that the most migration is from large origin population sizes to smaller destinations.

3.4 Estimating the Weight of Migration Across Four Countries

3.4.1 Overview

We estimated symmetric migration parameters between four demes (6 migration parameters)(Figure S16). We expanded our prior to estimate parameters between the bound of 0 and 5 in line with the prior predictive analysis (Figure S3).

3.4.2 Weights

To normalize the migration parameter estimates within each GPSCs we repeated the same method described in Section ‘Weight’. For GPSC2 and GPSC5 migration between The Gambia and Malawi exceeded all other deme pairs in line with what was seen in the 2 deme model. For GPSC8 the same pattern extended however migration between South Africa and Kenya followed closely behind. For GPSC26 migration between Kenya and South Africa was dominant with migration between The Gambia and Malawi being second highest. For GPSC10 migration between South Africa and the Gambia exceeded all other pairs and for GPSC22 migration between Kenya and The Gambia, and South Africa and The Gambia were lowest while all other deme pairs were similar (Figure 4, Table S2). In a generalized linear model there is no association between the distance between demes (Equation 8). However, when comparing the relative population size between deme pairs to the migration parameter estimates there is a significant association (Coefficient=-0.027, p = 0.00324; AIC=101.13) in that as the relative population size increases between demes there is less migration between them.

Figure 4: Migration parameter estimates for each GPSC.

Migration paths weighted by the relative migration rate for each deme pair within each GPSC. The nodes are colored as follows: South Africa is represented in pink, Malawi in yellow, Kenya in Green, and The Gambia in purple.

3.5 Between-country migration probability

While here we are able to identify migration parameters when migration occurs we do not account for how often migration may occur but within the same country, or not occur at all. To address this we used a risk ratio framework to investigate the risk of similarity across geographic distance Belman et al. [2023], Salje et al. [2017], Lefrancq et al. [2022]. We found that after 43 years of spread pairs are equally likely to be in Kenya as between Kenya and another country with an RR of 2.55 (95%CIs 0.42-8.19), for South Africa this is 55 years with a RR=1.55 (95% CIs 0.75-6.63), for The Gambia it is after 53 years of spread; RR=1.82 (95% CIs 0.82-3.36), and for Malawi this is after 55 years of spread; RR=2.20 (95%CIs 0.35-4.92)(Figure 5).

Figure 5: Out-of-country migration probability.

The risk of pairs within divergence times across rolling 10-year time windows of being found within the same country as between each country and every other country for Kenya (green), South Africa (pink), The Gambia (purple), and Malawi (yellow). A dashed red line indicates a relative risk=1 while a vertical dashed line for each plot denotes the tMRCA at which the relative risk crosses 1.

3.6 Discussion

Our modeling framework utilizes genomic data and population summary statistics to infer migration parameters between demes that can best replicate the extant population distribution. We were able to successfully validate the method using simulated data both estimating the direction of migration between demes in the two-deme model, as well as the weight of migration across four demes. Interestingly, we detected some heterogeneity in migration rates across different lineages. The baseline expectation is that considerable variation would not exist between GPSCs. The similarity in migration between invasive and non-invasive lineages may imply, as found in Tonkin-Hill et al. Tonkin-Hill et al. [2022] that although the invasive lineages are not frequently found in carriage, they may persist at low population frequencies allowing them to be transmitted across borders in similar patterns (Figure S17).

We also find country-pair specific migration patterns which are in line with the source country usually being that with the greater population size when inferring directionality. However, when determining the demographic factor (population size of origin or destination and distance between them) with the greatest contribution to migration we found that the destination population size was more important.

Some limitations to this method include our inability to include the true carriage population sizes and true genome length within the simulation framework. As such, it is impossible to contextualize the migration parameters within the context of time, however, their relative relationship remains useful nevertheless. Further, as we use the population sizes for each deme in accordance with the true population sizes of those countries, some bias may be introduced since the pneumococcal carriage rate is known to vary by both country and human population. The age structure of each deme may consequently influence the true carriage rate in that a deme with a larger child population relative to the adult population may have more carriage overall, while simultaneously, a population with more children is unlikely to be as mobile as an adult population. Resolving this would require more demographic interrogation of pneumococcal carriage and human mobility in these regions. Our framework could also be applied on a smaller spatial scale where more granular data exists about the underlying carriage rates, and migration between regions, provinces, or states could consequently be inferred with better precision. Additionally, the ability for the migrating bacteria to take hold in a country depends on previous pathogen spread as well as vaccine campaign implementation.

Future implementations of our models could for example include parameter estimates for the population size of each deme, and incorporation of vaccine coverage and future immunity. Alternatively, the migration parameters could be incorporated into independent migration simulation frameworks which account for population immunity and other covariates. Human mobility data could in theory be used but currently (September 2023) representative between country human-mobility data remains sparse. Meta provides travel data between-countries but these are dominated by high income countries. Of the four demes included in our framework only South Africa is present within the Meta datasets from 01/2021-04/2023. Interrogating open flights (https://openflights.org/data.html) between-country data for these four countries only provides sparse data for Kenya and Malawi, and no data for The Gambia. The majority of flights within Africa are not taken directly, they often include many stopovers which would further complicate the use of such data. Further, flight data would only represent a small subset of possible movements and in low-income countries it is largely reflecting tourism, while not accounting for the movement of the majority of the population Gössling and Humpe [2020], Findlater and Bogoch [2018].

Estimates of migration rates such as those produced as part of this study could be informative for vaccine implementation policy. If there is more migration from deme A to B, implementing the vaccine first in deme A may have spillover effects into deme B, allowing such interventions to have an effect beyond country borders. Some possibilities for further development of our framework include incorporating additional countries, and inferring correlation between GPSC-specific migration patterns and classes of mobility such as flights compared to roads; or adults compared to children. This will help us to understand what types of mobility best explain the estimated migration parameters. Provided rich data from genomics-based surveillance systems, the current approach would also be applicable to multiple other species of bacteria.

4 Data availability

The code and data associated with this paper are available at https://github.com/sophbel/LFI_between_country_migration/tree/main.

Data Availability

The code and data associated with this paper are available at https://github.com/sophbel/LFI_between_country_migration/tree/main.

5 Funding

This GPS project was funded by the Bill and Melinda Gates Foundation (grant code OPP1034556). S.B. and S.D.B. were supported by the Wellcome Sanger Institute (core Wellcome grants WT098051, 206194, and 108413/A/15/D). N.J.C. was supported by the UK Medical Research Council and Department for International Development [MR/R015600/1, MR/T016434/1]; a Sir Henry Dale Fellowship, jointly funded by Wellcome and the Royal Society [104169/Z/14/A].

6 Conflicts of interest

N.J.C. has consulted for Antigen Discovery Inc and Pfizer, and been invited to attend meetings funded by M.S.D. N.J.C. has received an investigator-initiated award from GlaxoSmithKline. No other authors have conflicts of interest to report.

A Supplementary Figures

Figure S1: Sensitivity of the F_st value to simulation parameters fixed at a sequence length of 100.

A) Initial population size fixing the alternate population size at 500 and the migration parameter at 2, sampling 100 from each. B) Sampled population size with a fixed migration parameter of 2, the sample size for the alternate population fixed at 500 and initial population parameters for A and B at 600 and 200 respectively. C) θ parameter (mutation rate) with a fixed migration parameter of 2, initial population sizes of A and B 600 and 200 respectively, sampling 100.

Figure S2: Recapturing input migration parameters with a 2 Deme model.

A) The overlapping posterior density migration parameter estimates for migration parameter 1 — from a population [a − d] to population [a−d] B) and the inverse. The ‘true’ input parameters were mig_a_−d=0.1 and mig_a_−d=0.6. The posterior densities were estimated with a uniform prior and are visualized independently for each parameter (light blue), the true input parameter is indicated by the red vertical line while the median posterior estimate is indicated by the blue dashed line. Deme A=South Africa, initial population size 6000; Deme B=Malawi, initial population size 2000; Deme C=Kenya, initial population size 500; and Deme D=The Gambia, initial population size 1000.

*Used no-uturn (nuts) sampling rather than metropolis sampling for migbc and mig_cb due to difficulty converging

Figure S3: The response of the fixation index to varied migration parameters.

Each plot indicates which migration parameter We varied and the F_st between the countries for each of those migration parameters is indicated by the colored lines.

Figure S4: Recapturing migration parameters in the 4 deme model.

The True Migration parameter is indicated by the red vertical line while the estimated median parameter is indicated by the blue-dashed vertical line. The posterior distribution density is represented by the blue histograms for each deme pair indicated by the title where a=South Africa, b=Malawi, c=Kenya, and d=The Gambia. The input population sizes for each of these scale to the true population size and are indicated in the figure.

Figure S5: Pairwise distance estimates for between-country genomes

across all 12,582 genome pairs from South Africa, Malawi, Kenya, and The Gambia, clustered in that order by A) Hamming distance and B) Jaccard Distance.

Figure S6: Histogram of gene length for each of the 81 neutral genes.

Figure S7: Pairwise Hamming distances across all genomes from each of the four demes (organized in the order of South Africa, Malawi, The Gambia, Kenya)

for each GPSC in turn. These only include biallelic SNP sites. A) Includes 81 ‘neutral’ genes. B)Includes 355 ‘neutral’ genes.

Figure S8: Pairwise Jaccard distances across all genomes from each of the four demes (organized in the order of South Africa, Malawi, The Gambia, Kenya)

for each GPSC in turn. A) includes 81 ‘neutral’ genes, B) includes 355 ‘neutral’ genes. These only include biallelic SNP sites.

Figure S9:

Mutual Information Scores between SNP pairs across 81 ‘neutral’ gene alignments for each of the GPSCs. The vertical dashed line indicates the 1kb cutoff under which removed correlated sites. The horizontal dashed line indicates the 0.2 mutual information score cutoff which has been used previously for the pneumococcus.

Figure S10: Estimated migration parameters removing correlated sites including mig_ab on the left and mig_ba on the right.

Excluding all within a 1kb window upstream with an r² > 0.5 (triangle), excluding all within a 1kb window upstream with an r² > 0.05 (square), and retaining all sites (circle). The error bars indicate 95% CIs and each GPSC is along the x-axis. Initial population sizes were for South Africa (deme A) and Malawi (deme B).

Figure S11: Pairwise comparison between the Hudson and Weir-Cockerham F_st values across all four demes.

In total this includes six comparisons, one between each deme and every other deme.

Figure S12: Hudsons F_st across all genomes between each of the four demes for each GPSC.

A) calculated from 81 genes, B) from 355 genes, and C) only including the PBP genes (which are likely under selection in each place due to their interaction with penicillin-resistance acquisition). A higher F_st is a more divergent, separate population, while a lower F_st is a more highly mixing population, also known as panmictic.

Figure S13: Convergence of asymmetric 2 deme parameter models.

A) The effective sample size (ESS) across all parameters estimated. ESS <100 is indicated in red. B) The posterior density of parameter estimates between South Africa and Kenya for GPSC10. These were unable to converge due to the high co-linearity between them.

Figure S14:

The posterior parameter distributions across 3 independent runs of 4 chains each across the 6 dominant GPSCs (columns) and 12 parameter estimates (rows).

Figure S15: Relative migration parameters asymmetrically between two deme pairs.

Relative migration for each deme pair within each GPSC independently. The x-axis indicated the deme and they are grouped by GPSC. The origin location of South Africa is represented in pink, Malawi in yellow, Kenya in Green, and The Gambia in purple.

Figure S16:

Posterior distributions for 6 parameter estimates for each GPSC, colored by parameter.

Figure S17: Summary of each GPSC migration parameters

A) The directional probability from the 2 deme model for each GPSC whereby red=>0.6, blue = 0.4-0.6, and grey = 0.1-0.4 probability of migration asymmetrically for each deme pair. The Node colors are described in the legend. B) The weighted migration from the 4 deme model between all 4 demes. The node colors are the same as A. C) The relative migration probability for each GPSC across all demes. The Origin country is colored the same as A and B and the Destination country is indicated in the legend.

Figure S18: The population sizes and distance between countries versus migration parameter estimates.

All plots include the migration parameter estimates (y-axis) against either distance between countries or the population size of the countries (x-axis). The left plot includes the distance between migration parameter demes (grey, triangles) and the right plot includes the population size of the origin (blue) or the destination (green). The models associated with each figure are included in lines of the same color.

B Supplementary Tables

View this table:

Table S1: Parameter estimates across all pairs within the two dememodel.

Values within the square brackets denote the 95% confidence intervals. The ‘Parameter’ is the raw migration parameter estimate while the ‘Relative Parameter’ is relative to all other deme pairs within each GPSC. The ‘Directional Migration Probability’ is the probability of migration asymmetrically for each GPSC and each deme pair (ie for sa − mal, GPSC10 there is 0.667 probability of migration while for mal − sa there is (1 − 0.667) probability of migration.)

View this table:

Table S2: Migration parameter estimates symmetrically across four demes for six GPSCs.

Acknowledgments

This work was supported by Wellcome under grant reference 108413/A/15/D, 2016194, and WT098051. For the purpose of Open Access, the author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission. Further we would like to thank all Global Pneumococcal Sequencing Project partners particularly those at the National Institute for Communicable Diseases in South Africa, the Medical Research Council in The Gambia, the Malawi Liverpool Wellcome Trust, and the Kenya Medical Research Institute.

Footnotes

† Co-senior authors:

References

1.
S. Aeschbacher, A. Futschik, and M. A. Beaumont. Approximate Bayesian computation for modular inference problems with many parameters: the example of migration rates. Molecular Ecology, 22(4):987–1002, 2013. ISSN 1365-294X. doi: 10.1111/mec.12165. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/mec.12165.
OpenUrl CrossRef
2.↵
F. Baumdicker, G. Bisschop, D. Goldstein, G. Gower, A. P. Ragsdale, G. Tsambos, S. Zhu, B. Eldon, E. C. Ellerman, J. G. Galloway, A. L. Gladstein, G. Gorjanc, B. Guo, B. Jeffery, W. W. Kretzschumar, K. Lohse, M. Matschiner, D. Nelson, N. S. Pope, C. D. Quinto-Cortes, M. F. Rodrigues, K. Saunack, T. Sellinger, K. Thornton, H. van Kemenade, A. W. Wohns, Y. Wong, S. Gravel, A. D. Kern, J. Koskela, P. L. Ralph, and J. Kelleher. Efficient ancestry and mutation simulation with msprime 1.0. Genetics, 220(3):iyab229, Mar. 2022. ISSN 1943-2631. doi: 10.1093/genetics/iyab229. URL 10.1093/genetics/iyab229.tex.ids=baumdickerEfficientAncestryMutation2022a.
OpenUrl CrossRef
3.↵
S. Belman, N. Lefrancq, S. Nzenze, S. Downs, M. d. Plessis, S. Lo, T. G. P. S. Consortium, L. McGee, S. A. Madhi, A. v. Gottberg, S. D. Bentley, and H. Salje. Geographic migration and vaccine-induced fitness changes of Streptococcus pneumoniae, Jan. 2023. URL https://www.biorxiv.org/content/10.1101/2023.01.18.524577v1. Pages: 2023.01.18.524577 Section: New Results.
4.
G. Bhatia, N. Patterson, S. Sankararaman, and A. L. Price. Estimating and interpreting FST: The impact of rare variants. Genome Research, 23 (9):1514–1521, Sept. 2013. ISSN 1088-9051. doi: 10.1101/gr.154831.113. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3759727/.
OpenUrl Abstract/FREE Full Text
5.↵
B. Chapman and J. Chang. Biopython: Python tools for computational biology. ACM SIGBIO Newsletter, 20(2):15–19, Aug. 2000. ISSN 0163-5697. doi: 10.1145/360262.360268. URL 10.1145/360262.360268.
OpenUrl CrossRef
6.
I. Comas, M. Coscolla, T. Luo, S. Borrell, K. E. Holt, M. Kato-Maeda, J. Parkhill, B. Malla, S. Berg, G. Thwaites, D. Yeboah-Manu, G. Bothamley, J. Mei, L. Wei, S. Bentley, S. R. Harris, S. Niemann, R. Diel, A. Aseffa, Q. Gao, D. Young, and S. Gagneux. Out-of-Africa migration and Neolithic coexpansion of Mycobacterium tuberculosis with modern humans. Nature genetics, 45(10):1176–82, Oct. 2013. ISSN 1546-1718. doi: 10.1038/ng.2744. URL http://www.ncbi.nlm.nih.gov/pubmed/23995134. Publisher: NIH Public Access.
OpenUrl CrossRef PubMed
7.
N. J. Croucher, W. P. Hanage, S. R. Harris, L. McGee, M. van der Linden, H. de Lencastre, R. Śa-Leao, J.-H. Song, K. S. Ko, B. Beall, K. P. Klugman, J. Parkhill, A. Tomasz, K. G. Kristinsson, and S. D. Bentley. Variable recombination dynamics during the emergence, transmission and ‘disarming’ of a multidrug-resistant pneumococcal clone. BMC Biology, 12 (1):49, June 2014. ISSN 1741-7007. doi: 10.1186/1741-7007-12-49. URL 10.1186/1741-7007-12-49.
OpenUrl CrossRef PubMed
8.↵
N. J. Croucher, A. J. Page, T. R. Connor, A. J. Delaney, J. A. Keane, S. D. Bentley, J. Parkhill, and S. R. Harris. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Research, 43(3):e15, Feb. 2015. ISSN 0305-1048. doi: 10.1093/nar/gku1196. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4330336/ .
OpenUrl CrossRef PubMed
9.↵
N. J. Croucher, J. J. Campo, T. Q. Le, X. Liang, S. D. Bentley, W. P. Hanage, and M. Lipsitch. Diverse evolutionary patterns of pneumococcal antigens identified by pangenome-wide immunological screening. Proceedings of the National Academy of Sciences, 114(3):E357–E366, Jan. 2017. doi: 10.1073/pnas.1613937114. URL https://www.pnas.org/doi/abs/10.1073/pnas.1613937114. Company: National Academy of Sciences Distributor: National Academy of Sciences Institution: National Academy of Sciences Label: National Academy of Sciences Publisher: Proceedings of the National Academy of Sciences.
OpenUrl Abstract/FREE Full Text
10.
E. Deutschmann, E. Recchi, and M. Vespe. Assessing Transnational Human Mobility on a Global Scale. In S. Pötzschke and S. Rinken, editors, Migration Research in a Digitized World: Using Innovative Technology to Tackle Methodological Challenges, IMISCOE Research Series, pages 169–192. Springer International Publishing, Cham, 2022. ISBN 978-3-031-01319-5. doi: 10.1007/978-3-031-01319-5 9. URL 10.1007/978-3-031-01319-5_9.
OpenUrl CrossRef
11.↵
X. Didelot, N. J. Croucher, S. D. Bentley, S. R. Harris, and D. J. Wilson. Bayesian inference of ancestral dates on bacterial phylogenetic trees. Nucleic Acids Research, 46(22):e134, Dec. 2018. ISSN 1362-4962. doi: 10.1093/nar/gky783.
OpenUrl CrossRef PubMed
12.
D. Domman, M.-L. Quilici, M. J. Dorman, E. Njamkepo, A. Mutreja, A. E. Mather, G. Delgado, R. Morales-Espinosa, P. A. D. Grimont, M. L. Liźarraga-Partida, C. Bouchier, D. M. Aanensen, P. Kuri-Morales, C. L. Tarr, G. Dougan, J. Parkhill, J. Campos, A. Cravioto, F.-X. Weill, and N. R. Thomson. Integrated view of Vibrio cholerae in the Americas. Science (New York, N.Y.), 358(6364):789–793, 2017. ISSN 1095-9203. doi: 10.1126/science.aao2136.
OpenUrl Abstract/FREE Full Text
13.
G. Ewing and J. Hermisson. MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus. Bioinformatics, 26(16):2064–2065, Aug. 2010. ISSN 1367-4803. doi: 10.1093/bioinformatics/btq322. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2916717/.
OpenUrl CrossRef PubMed Web of Science
14.↵
A. Findlater and I. I. Bogoch. Human Mobility and the Global Spread of Infectious Diseases: A Focus on Air Travel. Trends in Parasitology, 34 (9):772–783, Sept. 2018. ISSN 1471-4922. doi: 10.1016/j.pt.2018.07.004. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7106444/.
OpenUrl CrossRef PubMed
15.
L. Gabrielli, E. Deutschmann, F. Natale, E. Recchi, and M. Vespe. Dissecting global air traffic data to discern different types and trends of transnational human mobility. EPJ Data Science, 8 (1):1–24, Dec. 2019. ISSN 2193-1127. doi: 10.1140/epjds/s13688-019-0204-x. URL https://epjdatascience.springeropen.com/articles/10.1140/epjds/s13688-019-0204-x. Number: 1 Publisher: SpringerOpen.
OpenUrl CrossRef
16.↵
R. A. Gladstone, S. W. Lo, J. A. Lees, N. J. Croucher, A. J. v. Tonder, J. Corander, A. J. Page, P. Marttinen, L. J. Bentley, T. J. Ochoa, P. L. Ho, M. d. Plessis, J. E. Cornick, B. Kwambana-Adams, R. Benisty, S. A. Nzenze, S. A. Madhi, P. A. Hawkins, D. B. Everett, M. Antonio, R. Dagan, K. P. Klugman, A. v. Gottberg, L. McGee, R. F. Breiman, and S. D. Bentley. International genomic definition of pneumococcal lineages, to contextualise disease, antibiotic resistance and vaccine impact. EBioMedicine, 43:338–346, May 2019. ISSN 2352-3964. doi: 10.1016/j.ebiom.2019.04.021. URL https://www.thelancet.com/journals/ebiom/article/PIIS2352-3964(19)30259-2/abstract. Publisher: Elsevier.
OpenUrl CrossRef
17.
GPS, L. McGee, S. D. Bentley, and K. P. Klugman. GPS:: Global Pneumococcal Sequencing Project, 2022. URL https://www.pneumogen.net/gps/.
18.
M. U. Gutmann, J. Cor, and er. Bayesian Optimization for Likelihood-Free Inference of Simulator-Based Statistical Models. Journal of Machine Learning Research, 17(125):1–47, 2016. ISSN 1533-7928. URL http://jmlr.org/papers/v17/15-017.html.
OpenUrl
19.↵
S. Gössling and A. Humpe. The global scale, distribution and growth of aviation: Implications for climate change. Global Environmental Change, 65:102194, Nov. 2020. ISSN 0959-3780. doi: 10.1016/j.gloenvcha.2020.102194. URL https://www.sciencedirect.com/science/article/pii/S0959378020307779.
OpenUrl CrossRef
20.↵
R. W. Hamming. Error detecting and error correcting codes. The Bell System Technical Journal, 29(2):147–160, Apr. 1950. ISSN 0005-8580. doi: 10.1002/j.1538-7305.1950.tb00463.x. Conference Name: The Bell System Technical Journal.
OpenUrl CrossRef
21.
K. E. Holsinger and B. S. Weir. Genetics in geographically structured populations: defining, estimating and interpreting FST. Nature Reviews Genetics, 10(9):639–650, Sept. 2009. ISSN 1471-0064. doi: doi:10.1038/nrg2611. URL https://www.nature.com/articles/nrg2611. Number: 9 Publisher: Nature Publishing Group.
OpenUrl CrossRef PubMed Web of Science
22.
R. R. Hudson. Generating samples under a Wright–Fisher neutral model of genetic variation. Bioinformatics, 18(2):337–338, Feb. 2002. ISSN 1367-4803. doi: 10.1093/bioinformatics/18.2.337. URL 10.1093/bioinformatics/18.2.337.
OpenUrl CrossRef PubMed Web of Science
23.
R. R. Hudson, M. Slatkin, and W. P. Maddison. Estimation of levels of gene flow from DNA sequence data. Genetics, 132(2):583–589, Oct. 1992. ISSN 1943-2631. doi: 10.1093/genetics/132.2.583. URL 10.1093/genetics/132.2.583.
OpenUrl Abstract/FREE Full Text
24.
P. Jaccard. The Distribution of the Flora in the Alpine Zone.1. New Phytologist, 11(2):37–50, 1912. ISSN 1469-8137. doi: 10.1111/j.1469-8137.1912.tb05611.x. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1469-8137.1912.tb05611.x. eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1469-8137.1912.tb05611.x.
OpenUrl CrossRef
25.
J. Kelleher and K. Lohse. Coalescent Simulation with msprime. In J. Y. Dutheil, editor, Statistical Population Genomics, Methods in Molecular Biology, pages 191–230. Springer US, New York, NY, 2020. ISBN 978-1-07-160199-0. doi: 10.1007/978-1-0716-0199-09. URL 10.1007/978-1-0716-0199-0_9.tex.ids=kelleherCoalescentSimu-lationMsprime2020a seriesTitle: Methods in Molecular Biology.
OpenUrl CrossRef
26.↵
J. Kelleher, A. M. Etheridge, and G. McVean. Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes. PLOS Computational Biology, 12(5):e1004842, May 2016. ISSN 1553-7358. doi: 10.1371/journal.pcbi.1004842. URL https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004842.tex.ids=kelleherEfficientCoalescentSimulation2016a publisher: Public Library of Science.
OpenUrl CrossRef PubMed
27.
A. D. Kern and D. R. Schrider. Discoal: flexible coalescent simulations with selection. Bioinformatics, 32(24):3839–3841, Dec. 2016. ISSN 1367-4803. doi: 10.1093/bioinformatics/btw556. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5167068/.
OpenUrl CrossRef PubMed
28.
M. Kimura. THE NUMBER OF HETEROZYGOUS NUCLEOTIDE SITES MAINTAINED IN A FINITE POPULATION DUE TO STEADY FLUX OF MUTATIONS. Genetics, 61(4):893–903, Apr. 1969. ISSN 1943-2631. doi: 10.1093/genetics/61.4.893. URL 10.1093/genetics/61.4.893.
OpenUrl FREE Full Text
29.
F. Lassalle, S. Al-Shalali, M. Al-Hakimi, E. Njamkepo, I. M. Bashir, M. J. Dorman, J. Rauzier, G. A. Blackwell, A. Taylor-Brown, M. A. Beale, A. Cazares, A. A. Al-Somainy, A. Al-Mahbashi, K. Almoayed, M. Aldawla, A. Al-Harazi, M.-L. Quilici, F.-X. Weill, G. Dhabaan, and N. R. Thomson. Genomic epidemiology reveals multidrug resistant plasmid spread between Vibrio cholerae lineages in Yemen. Nature Microbiology, 8(10):1787–1798, Oct. 2023. ISSN 2058-5276. doi: 10.1038/s41564-023-01472-1. URL https://www.nature.com/articles/s41564-023-01472-1. Number: 10 Publisher: Nature Publishing Group.
OpenUrl CrossRef
30.
M. Layan, N. F. Müller, S. Dellicour, N. De Maio, H. Bourhy, S. Cauchemez, and G. Baele. Impact and mitigation of sampling bias to determine viral spread: Evaluating discrete phylogeography through CTMC modeling and structured coalescent model approximations. Virus Evolution, 9(1): vead010, Jan. 2023. ISSN 2057-1577. doi: 10.1093/ve/vead010. URL 10.1093/ve/vead010.
OpenUrl CrossRef
31.↵
N. Lefrancq, V. Bouchez, N. Fernandes, A.-M. Barkoff, T. Bosch, T. Dalby, T. Åkerlund, J. Darenberg, K. Fabianova, D. F. Vestrheim, N. K. Fry, J. J. Gonźalez-Ĺopez, K. Gullsby, A. Habington, Q. He, D. Litt, H. Martini, D. Píerard, P. Stefanelli, M. Stegger, J. Zavadilova, N. Armatys, A. Landier, S. Guillot, S. L. Hong, P. Lemey, J. Parkhill, J. Toubiana, S. Cauchemez, H. Salje, and S. Brisse. Global spatial dynamics and vaccine-induced fitness changes of Bordetella pertussis. Science Translational Medicine, 14(642):eabn3253, Apr. 2022. ISSN 1946-6242. doi: 10.1126/scitranslmed.abn3253.
OpenUrl CrossRef
32.↵
H. Li and R. Durbin. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 25(14):1754–1760, July 2009. ISSN 1367-4803. doi: 10.1093/bioinformatics/btp324. URL 10.1093/bioinformatics/btp324.
OpenUrl CrossRef PubMed Web of Science
33.↵
H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, and 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics (Oxford, England), 25(16):2078–2079, Aug. 2009. ISSN 1367-4811. doi: 10.1093/bioinformatics/btp352.
OpenUrl CrossRef PubMed Web of Science
34.↵
J. Lintusaari, H. Vuollekoski, A. Kangasrääasïo, K. Skyten, M. Järvenpää, P. Marttinen, M. U. Gutmann, A. Vehtari, J. Corander, and S. Kaski. ELFI: Engine for Likelihood-Free Inference. Journal of Machine Learning Research, 19(16):1–7, 2018. ISSN 1533-7928. URL http://jmlr.org/papers/v19/17-374.html.
OpenUrl
35.
J. Ma, A. Ratan, B. J. Raney, B. B. Suh, W. Miller, and D. Haussler. The infinite sites model of genome evolution. Proceedings of the National Academy of Sciences, 105(38):14254–14261, Sept. 2008. doi: 10.1073/pnas.0805217105. URL https://www.pnas.org/doi/10.1073/pnas.0805217105. Publisher: Proceedings of the National Academy of Sciences.
OpenUrl Abstract/FREE Full Text
36.
P. Maas. Facebook Disaster Maps: Aggregate Insights for Crisis Response & Recovery. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3173–3173, Anchorage AK USA, July 2019. ACM. ISBN 978-1-4503-6201-6. doi: 10.1145/3292500.3340412. URL https://dl.acm.org/doi/10.1145/3292500.3340412.
OpenUrl CrossRef
37.↵
A. Miles, p. i. Bot, M. R. P. Ralph, N. Harding, R. Pisupati, S. Rae, and T. Millar. cggh/scikit-allel: v1.3.3, May 2021. URL https://zenodo.org/record/4759368.
38.↵
A. H. Murphy. The Finley Affair: A Signal Event in the History of Forecast Verification. Weather and Forecasting, 11(1):3–20, Mar. 1996. ISSN 1520-0434, 0882-8156. doi: 10.1175/1520-0434(1996)011⟨0003:TFAASE⟩2.0.CO;2. URL https://journals.ametsoc.org/view/journals/wefo/11/1/1520-0434_1996_011_0003_tfaase_2_0_co_2.xml. Publisher: American Meteorological Society Section: Weather and Forecasting.
OpenUrl CrossRef
39.↵
D. Nelson, J. Kelleher, A. P. Ragsdale, C. Moreau, G. McVean, and S. Gravel. Accounting for long-range correlations in genome-wide simulations of large cohorts. PLOS Genetics, 16(5):e1008619, May 2020. ISSN 1553-7404. doi: 10.1371/journal.pgen.1008619. URL https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008619. Publisher: Public Library of Science.
OpenUrl CrossRef
40.
C. K. Okoro, R. A. Kingsley, T. R. Connor, S. R. Harris, C. M. Parry, M. N. Al-Mashhadani, S. Kariuki, C. L. Msefula, M. A. Gordon, E. de Pinna, J. Wain, R. S. Heyderman, S. Obaro, P. L. Alonso, I. Mandomando, C. A. MacLennan, M. D. Tapia, M. M. Levine, S. M. Tennant, J. Parkhill, and G. Dougan. Intra-continental spread of human invasive Salmonella Typhimurium pathovariants in sub-Saharan Africa. Nature genetics, 44 (11):1215–1221, Nov. 2012. ISSN 1061-4036. doi: 10.1038/ng.2423. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3491877/.
OpenUrl CrossRef PubMed
41.↵
J. Pensar, S. Puranen, B. Arnold, N. MacAlasdair, J. Kuronen, G. Tonkin-Hill, M. Pesonen, Y. Xu, A. Sipola, L. Śanchez-Buśo, J. A. Lees, C. Chewapreecha, S. D. Bentley, S. R. Harris, J. Parkhill, N. J. Croucher, and J. Corander. Genome-wide epistasis and co-selection study using mutual information. Nucleic Acids Research, 47(18):e112, Oct. 2019. ISSN 0305-1048. doi: 10.1093/nar/gkz656. URL 10.1093/nar/gkz656.
OpenUrl CrossRef
42.
B. Quintero Moreno and M. Araque. Molecular characterisation of multidrug-resistant pneumococcal clones colonising healthy children in Merida, Venezuela. Journal of Global Antimicrobial Resistance, 14: 45–50, Sept. 2018. ISSN 2213-7165. doi: 10.1016/j.jgar.2018.02.004. URL https://www.sciencedirect.com/science/article/pii/S2213716518300328.
OpenUrl CrossRef
43.
A. Rose, J. McKee, K. Sims, E. Bright, A. Reith, and M. Urban. LandScan Global 2019, 2020. URL landscan.ornl.gov. Edition: 2019 Place: Oak Ridge, cTN Section: July 01, 2020.
44.↵
H. Salje, J. Lessler, I. M. Berry, M. C. Melendrez, T. Endy, S. Kalayanarooj, A. A-Nuegoonpipat, S. Chanama, S. Sangkijporn, C. Klungthong, B. Thaisomboonsuk, A. Nisalak, R. V. Gibbons, S. Iamsirithaworn, L. R. Macareo, I.-K. Yoon, A. Sangarsang, R. G. Jarman, and D. A. T. Cummings. Dengue diversity across spatial and temporal scales: Local structure and the effect of host population size. Science, 355(6331): 1302–1306, Mar. 2017. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.aaj9384. URL https://science.sciencemag.org/content/355/6331/1302. Publisher: American Association for the Advancement of Science Section: Report.
OpenUrl Abstract/FREE Full Text
45.
R. K. Selander and R. O. Hudson. Animal Population Structure Under Close Inbreeding: The Land Snail Rumina in Southern France. The American Naturalist, 110(974):695–718, July 1976. ISSN 0003-0147. doi: 10.1086/283098. URL https://www.journals.uchicago.edu/doi/abs/ 10.1086/283098. Publisher: The University of Chicago Press.
OpenUrl CrossRef Web of Science
46.↵
S. A. Sisson, Y. Fan, and M. Beaumont, editors. Handbook of Approximate Bayesian Computation. Chapman and Hall/CRC, New York, Sept. 2018. ISBN 978-1-315-11719-5. doi: 10.1201/9781315117195.
OpenUrl CrossRef
47.↵
A. Stamatakis. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics (Oxford, England), 30 (9):1312–3, May 2014. ISSN 1367-4811. doi: 10.1093/bioinformatics/btu033. URL http://www.ncbi.nlm.nih.gov/pubmed/24451623. Publisher: Oxford University Press.
OpenUrl CrossRef PubMed Web of Science
48.↵
G. Tonkin-Hill, N. MacAlasdair, C. Ruis, A. Weimann, G. Horesh, J. A. Lees, R. A. Gladstone, S. Lo, C. Beaudoin, R. A. Floto, S. D. Frost, J. Corander, S. D. Bentley, and J. Parkhill. Producing Polished Prokaryotic Pangenomes with the Panaroo Pipeline. preprint, Genomics, Jan. 2020. URL http://biorxiv.org/lookup/doi/10.1101/2020.01.28.922989.
49.↵
G. Tonkin-Hill, C. Ling, C. Chaguza, S. J. Salter, P. Hinfonthong, E. Nikolaou, N. Tate, A. Pastusiak, C. Turner, C. Chewapreecha, S. D. W. Frost, J. Corander, N. J. Croucher, P. Turner, and S. D. Bentley. Pneumococcal within-host diversity during colonization, transmission and treatment. Nature Microbiology, 7(11):1791–1804, Nov. 2022. ISSN 2058-5276. doi: 10.1038/s41564-022-01238-1. URL https://www.nature.com/articles/s41564-022-01238-1. Number: 11 Publisher: Nature Publishing Group.
OpenUrl CrossRef
50.
A. J. van Tonder, J. E. Bray, L. Roalfe, R. White, M. Zancolli, S. J. Quirk, G. Haraldsson, K. A. Jolley, M. C. J. Maiden, S. D. Bentley, Haraldsson, H. Erlendsdottir, K. G. Kristinsson, D. Goldblatt, and A. B. Brueggemann. Genomics Reveals the Worldwide Distribution of Multidrug-Resistant Serotype 6E Pneumococci. Journal of Clinical Microbiology, 53 (7):2271–2285, July 2015. ISSN 0095-1137. doi: 10.1128/JCM.00744-15. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4473186/.
OpenUrl Abstract/FREE Full Text
51.
D. Wegmann, C. Leuenberger, and L. Excoffier. Efficient Approximate Bayesian Computation Coupled With Markov Chain Monte Carlo Without Likelihood. Genetics, 182(4):1207–1218, Aug. 2009. ISSN 1943-2631. doi: 10.1534/genetics.109.102509. URL 10.1534/genetics.109.102509.
OpenUrl Abstract/FREE Full Text
52.
B. S. Weir and C. C. Cockerham. Estimating F-Statistics for the Analysis of Population Structure. Evolution, 38(6):1358–1370, 1984. ISSN 0014-3820. doi: 10.2307/2408641. URL https://www.jstor.org/stable/2408641. Publisher: [Society for the Study of Evolution, Wiley].
OpenUrl CrossRef PubMed Web of Science
53.
S. Wright. The genetical structure of populations. Annals of Eugenics, 15(1):323–354, Jan. 1949. ISSN 2050-1420. doi: 10.1111/j.1469-1809.1949.tb02451.x. URL https://onlinelibrary.wiley.com/doi/abs/ 10.1111/j.1469-1809.1949.tb02451.x. Publisher: John Wiley & Sons, Ltd.
OpenUrl CrossRef Web of Science

View the discussion thread.

Posted November 20, 2023.

Download PDF

Data/Code

Citation Tools

Subject Area

Epidemiology

Subject Areas

All Articles

Addiction Medicine (349)
Allergy and Immunology (668)
Allergy and Immunology (668)
Anesthesia (181)
Cardiovascular Medicine (2648)
Dentistry and Oral Medicine (316)
Dermatology (223)
Emergency Medicine (399)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
Epidemiology (12228)
Forensic Medicine (10)
Gastroenterology (759)
Genetic and Genomic Medicine (4103)
Geriatric Medicine (387)
Health Economics (680)
Health Informatics (2657)
Health Policy (1005)
Health Systems and Quality Improvement (985)
Hematology (363)
HIV/AIDS (851)
Infectious Diseases (except HIV/AIDS) (13695)
Intensive Care and Critical Care Medicine (797)
Medical Education (399)
Medical Ethics (109)
Nephrology (436)
Neurology (3882)
Nursing (209)
Nutrition (577)
Obstetrics and Gynecology (739)
Occupational and Environmental Health (695)
Oncology (2030)
Ophthalmology (585)
Orthopedics (240)
Otolaryngology (306)
Pain Medicine (250)
Palliative Medicine (75)
Pathology (473)
Pediatrics (1115)
Pharmacology and Therapeutics (466)
Primary Care Research (452)
Psychiatry and Clinical Psychology (3432)
Public and Global Health (6527)
Radiology and Imaging (1403)
Rehabilitation Medicine and Physical Therapy (814)
Respiratory Medicine (871)
Rheumatology (409)
Sexual and Reproductive Health (410)
Sports Medicine (342)
Surgery (448)
Toxicology (53)
Transplantation (185)
Urology (165)

[1] 1.
S. Aeschbacher, A. Futschik, and M. A. Beaumont. Approximate Bayesian computation for modular inference problems with many parameters: the example of migration rates. Molecular Ecology, 22(4):987–1002, 2013. ISSN 1365-294X. doi: 10.1111/mec.12165. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/mec.12165.
OpenUrl CrossRef

[2] 2.↵
F. Baumdicker, G. Bisschop, D. Goldstein, G. Gower, A. P. Ragsdale, G. Tsambos, S. Zhu, B. Eldon, E. C. Ellerman, J. G. Galloway, A. L. Gladstein, G. Gorjanc, B. Guo, B. Jeffery, W. W. Kretzschumar, K. Lohse, M. Matschiner, D. Nelson, N. S. Pope, C. D. Quinto-Cortes, M. F. Rodrigues, K. Saunack, T. Sellinger, K. Thornton, H. van Kemenade, A. W. Wohns, Y. Wong, S. Gravel, A. D. Kern, J. Koskela, P. L. Ralph, and J. Kelleher. Efficient ancestry and mutation simulation with msprime 1.0. Genetics, 220(3):iyab229, Mar. 2022. ISSN 1943-2631. doi: 10.1093/genetics/iyab229. URL 10.1093/genetics/iyab229.tex.ids=baumdickerEfficientAncestryMutation2022a.
OpenUrl CrossRef

[3] 3.↵
S. Belman, N. Lefrancq, S. Nzenze, S. Downs, M. d. Plessis, S. Lo, T. G. P. S. Consortium, L. McGee, S. A. Madhi, A. v. Gottberg, S. D. Bentley, and H. Salje. Geographic migration and vaccine-induced fitness changes of Streptococcus pneumoniae, Jan. 2023. URL https://www.biorxiv.org/content/10.1101/2023.01.18.524577v1. Pages: 2023.01.18.524577 Section: New Results.

[4] 4.
G. Bhatia, N. Patterson, S. Sankararaman, and A. L. Price. Estimating and interpreting FST: The impact of rare variants. Genome Research, 23 (9):1514–1521, Sept. 2013. ISSN 1088-9051. doi: 10.1101/gr.154831.113. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3759727/.
OpenUrl Abstract/FREE Full Text

[5] 5.↵
B. Chapman and J. Chang. Biopython: Python tools for computational biology. ACM SIGBIO Newsletter, 20(2):15–19, Aug. 2000. ISSN 0163-5697. doi: 10.1145/360262.360268. URL 10.1145/360262.360268.
OpenUrl CrossRef

[6] 6.
I. Comas, M. Coscolla, T. Luo, S. Borrell, K. E. Holt, M. Kato-Maeda, J. Parkhill, B. Malla, S. Berg, G. Thwaites, D. Yeboah-Manu, G. Bothamley, J. Mei, L. Wei, S. Bentley, S. R. Harris, S. Niemann, R. Diel, A. Aseffa, Q. Gao, D. Young, and S. Gagneux. Out-of-Africa migration and Neolithic coexpansion of Mycobacterium tuberculosis with modern humans. Nature genetics, 45(10):1176–82, Oct. 2013. ISSN 1546-1718. doi: 10.1038/ng.2744. URL http://www.ncbi.nlm.nih.gov/pubmed/23995134. Publisher: NIH Public Access.
OpenUrl CrossRef PubMed

[7] 7.
N. J. Croucher, W. P. Hanage, S. R. Harris, L. McGee, M. van der Linden, H. de Lencastre, R. Śa-Leao, J.-H. Song, K. S. Ko, B. Beall, K. P. Klugman, J. Parkhill, A. Tomasz, K. G. Kristinsson, and S. D. Bentley. Variable recombination dynamics during the emergence, transmission and ‘disarming’ of a multidrug-resistant pneumococcal clone. BMC Biology, 12 (1):49, June 2014. ISSN 1741-7007. doi: 10.1186/1741-7007-12-49. URL 10.1186/1741-7007-12-49.
OpenUrl CrossRef PubMed

[8] 8.↵
N. J. Croucher, A. J. Page, T. R. Connor, A. J. Delaney, J. A. Keane, S. D. Bentley, J. Parkhill, and S. R. Harris. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Research, 43(3):e15, Feb. 2015. ISSN 0305-1048. doi: 10.1093/nar/gku1196. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4330336/ .
OpenUrl CrossRef PubMed

[9] 9.↵
N. J. Croucher, J. J. Campo, T. Q. Le, X. Liang, S. D. Bentley, W. P. Hanage, and M. Lipsitch. Diverse evolutionary patterns of pneumococcal antigens identified by pangenome-wide immunological screening. Proceedings of the National Academy of Sciences, 114(3):E357–E366, Jan. 2017. doi: 10.1073/pnas.1613937114. URL https://www.pnas.org/doi/abs/10.1073/pnas.1613937114. Company: National Academy of Sciences Distributor: National Academy of Sciences Institution: National Academy of Sciences Label: National Academy of Sciences Publisher: Proceedings of the National Academy of Sciences.
OpenUrl Abstract/FREE Full Text

[10] 10.
E. Deutschmann, E. Recchi, and M. Vespe. Assessing Transnational Human Mobility on a Global Scale. In S. Pötzschke and S. Rinken, editors, Migration Research in a Digitized World: Using Innovative Technology to Tackle Methodological Challenges, IMISCOE Research Series, pages 169–192. Springer International Publishing, Cham, 2022. ISBN 978-3-031-01319-5. doi: 10.1007/978-3-031-01319-5 9. URL 10.1007/978-3-031-01319-5_9.
OpenUrl CrossRef

[11] 11.↵
X. Didelot, N. J. Croucher, S. D. Bentley, S. R. Harris, and D. J. Wilson. Bayesian inference of ancestral dates on bacterial phylogenetic trees. Nucleic Acids Research, 46(22):e134, Dec. 2018. ISSN 1362-4962. doi: 10.1093/nar/gky783.
OpenUrl CrossRef PubMed

[12] 12.
D. Domman, M.-L. Quilici, M. J. Dorman, E. Njamkepo, A. Mutreja, A. E. Mather, G. Delgado, R. Morales-Espinosa, P. A. D. Grimont, M. L. Liźarraga-Partida, C. Bouchier, D. M. Aanensen, P. Kuri-Morales, C. L. Tarr, G. Dougan, J. Parkhill, J. Campos, A. Cravioto, F.-X. Weill, and N. R. Thomson. Integrated view of Vibrio cholerae in the Americas. Science (New York, N.Y.), 358(6364):789–793, 2017. ISSN 1095-9203. doi: 10.1126/science.aao2136.
OpenUrl Abstract/FREE Full Text

[13] 13.
G. Ewing and J. Hermisson. MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus. Bioinformatics, 26(16):2064–2065, Aug. 2010. ISSN 1367-4803. doi: 10.1093/bioinformatics/btq322. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2916717/.
OpenUrl CrossRef PubMed Web of Science

[14] 14.↵
A. Findlater and I. I. Bogoch. Human Mobility and the Global Spread of Infectious Diseases: A Focus on Air Travel. Trends in Parasitology, 34 (9):772–783, Sept. 2018. ISSN 1471-4922. doi: 10.1016/j.pt.2018.07.004. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7106444/.
OpenUrl CrossRef PubMed

[15] 15.
L. Gabrielli, E. Deutschmann, F. Natale, E. Recchi, and M. Vespe. Dissecting global air traffic data to discern different types and trends of transnational human mobility. EPJ Data Science, 8 (1):1–24, Dec. 2019. ISSN 2193-1127. doi: 10.1140/epjds/s13688-019-0204-x. URL https://epjdatascience.springeropen.com/articles/10.1140/epjds/s13688-019-0204-x. Number: 1 Publisher: SpringerOpen.
OpenUrl CrossRef

[16] 16.↵
R. A. Gladstone, S. W. Lo, J. A. Lees, N. J. Croucher, A. J. v. Tonder, J. Corander, A. J. Page, P. Marttinen, L. J. Bentley, T. J. Ochoa, P. L. Ho, M. d. Plessis, J. E. Cornick, B. Kwambana-Adams, R. Benisty, S. A. Nzenze, S. A. Madhi, P. A. Hawkins, D. B. Everett, M. Antonio, R. Dagan, K. P. Klugman, A. v. Gottberg, L. McGee, R. F. Breiman, and S. D. Bentley. International genomic definition of pneumococcal lineages, to contextualise disease, antibiotic resistance and vaccine impact. EBioMedicine, 43:338–346, May 2019. ISSN 2352-3964. doi: 10.1016/j.ebiom.2019.04.021. URL https://www.thelancet.com/journals/ebiom/article/PIIS2352-3964(19)30259-2/abstract. Publisher: Elsevier.
OpenUrl CrossRef

[17] 17.
GPS, L. McGee, S. D. Bentley, and K. P. Klugman. GPS:: Global Pneumococcal Sequencing Project, 2022. URL https://www.pneumogen.net/gps/.

[18] 18.
M. U. Gutmann, J. Cor, and er. Bayesian Optimization for Likelihood-Free Inference of Simulator-Based Statistical Models. Journal of Machine Learning Research, 17(125):1–47, 2016. ISSN 1533-7928. URL http://jmlr.org/papers/v17/15-017.html.
OpenUrl

[19] 19.↵
S. Gössling and A. Humpe. The global scale, distribution and growth of aviation: Implications for climate change. Global Environmental Change, 65:102194, Nov. 2020. ISSN 0959-3780. doi: 10.1016/j.gloenvcha.2020.102194. URL https://www.sciencedirect.com/science/article/pii/S0959378020307779.
OpenUrl CrossRef

[20] 20.↵
R. W. Hamming. Error detecting and error correcting codes. The Bell System Technical Journal, 29(2):147–160, Apr. 1950. ISSN 0005-8580. doi: 10.1002/j.1538-7305.1950.tb00463.x. Conference Name: The Bell System Technical Journal.
OpenUrl CrossRef

[21] 21.
K. E. Holsinger and B. S. Weir. Genetics in geographically structured populations: defining, estimating and interpreting FST. Nature Reviews Genetics, 10(9):639–650, Sept. 2009. ISSN 1471-0064. doi: doi:10.1038/nrg2611. URL https://www.nature.com/articles/nrg2611. Number: 9 Publisher: Nature Publishing Group.
OpenUrl CrossRef PubMed Web of Science

[22] 22.
R. R. Hudson. Generating samples under a Wright–Fisher neutral model of genetic variation. Bioinformatics, 18(2):337–338, Feb. 2002. ISSN 1367-4803. doi: 10.1093/bioinformatics/18.2.337. URL 10.1093/bioinformatics/18.2.337.
OpenUrl CrossRef PubMed Web of Science

[23] 23.
R. R. Hudson, M. Slatkin, and W. P. Maddison. Estimation of levels of gene flow from DNA sequence data. Genetics, 132(2):583–589, Oct. 1992. ISSN 1943-2631. doi: 10.1093/genetics/132.2.583. URL 10.1093/genetics/132.2.583.
OpenUrl Abstract/FREE Full Text

[24] 24.
P. Jaccard. The Distribution of the Flora in the Alpine Zone.1. New Phytologist, 11(2):37–50, 1912. ISSN 1469-8137. doi: 10.1111/j.1469-8137.1912.tb05611.x. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1469-8137.1912.tb05611.x. eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1469-8137.1912.tb05611.x.
OpenUrl CrossRef

[25] 25.
J. Kelleher and K. Lohse. Coalescent Simulation with msprime. In J. Y. Dutheil, editor, Statistical Population Genomics, Methods in Molecular Biology, pages 191–230. Springer US, New York, NY, 2020. ISBN 978-1-07-160199-0. doi: 10.1007/978-1-0716-0199-09. URL 10.1007/978-1-0716-0199-0_9.tex.ids=kelleherCoalescentSimu-lationMsprime2020a seriesTitle: Methods in Molecular Biology.
OpenUrl CrossRef

[26] 26.↵
J. Kelleher, A. M. Etheridge, and G. McVean. Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes. PLOS Computational Biology, 12(5):e1004842, May 2016. ISSN 1553-7358. doi: 10.1371/journal.pcbi.1004842. URL https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004842.tex.ids=kelleherEfficientCoalescentSimulation2016a publisher: Public Library of Science.
OpenUrl CrossRef PubMed

[27] 27.
A. D. Kern and D. R. Schrider. Discoal: flexible coalescent simulations with selection. Bioinformatics, 32(24):3839–3841, Dec. 2016. ISSN 1367-4803. doi: 10.1093/bioinformatics/btw556. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5167068/.
OpenUrl CrossRef PubMed

[28] 28.
M. Kimura. THE NUMBER OF HETEROZYGOUS NUCLEOTIDE SITES MAINTAINED IN A FINITE POPULATION DUE TO STEADY FLUX OF MUTATIONS. Genetics, 61(4):893–903, Apr. 1969. ISSN 1943-2631. doi: 10.1093/genetics/61.4.893. URL 10.1093/genetics/61.4.893.
OpenUrl FREE Full Text

[29] 29.
F. Lassalle, S. Al-Shalali, M. Al-Hakimi, E. Njamkepo, I. M. Bashir, M. J. Dorman, J. Rauzier, G. A. Blackwell, A. Taylor-Brown, M. A. Beale, A. Cazares, A. A. Al-Somainy, A. Al-Mahbashi, K. Almoayed, M. Aldawla, A. Al-Harazi, M.-L. Quilici, F.-X. Weill, G. Dhabaan, and N. R. Thomson. Genomic epidemiology reveals multidrug resistant plasmid spread between Vibrio cholerae lineages in Yemen. Nature Microbiology, 8(10):1787–1798, Oct. 2023. ISSN 2058-5276. doi: 10.1038/s41564-023-01472-1. URL https://www.nature.com/articles/s41564-023-01472-1. Number: 10 Publisher: Nature Publishing Group.
OpenUrl CrossRef

[30] 30.
M. Layan, N. F. Müller, S. Dellicour, N. De Maio, H. Bourhy, S. Cauchemez, and G. Baele. Impact and mitigation of sampling bias to determine viral spread: Evaluating discrete phylogeography through CTMC modeling and structured coalescent model approximations. Virus Evolution, 9(1): vead010, Jan. 2023. ISSN 2057-1577. doi: 10.1093/ve/vead010. URL 10.1093/ve/vead010.
OpenUrl CrossRef

[31] 31.↵
N. Lefrancq, V. Bouchez, N. Fernandes, A.-M. Barkoff, T. Bosch, T. Dalby, T. Åkerlund, J. Darenberg, K. Fabianova, D. F. Vestrheim, N. K. Fry, J. J. Gonźalez-Ĺopez, K. Gullsby, A. Habington, Q. He, D. Litt, H. Martini, D. Píerard, P. Stefanelli, M. Stegger, J. Zavadilova, N. Armatys, A. Landier, S. Guillot, S. L. Hong, P. Lemey, J. Parkhill, J. Toubiana, S. Cauchemez, H. Salje, and S. Brisse. Global spatial dynamics and vaccine-induced fitness changes of Bordetella pertussis. Science Translational Medicine, 14(642):eabn3253, Apr. 2022. ISSN 1946-6242. doi: 10.1126/scitranslmed.abn3253.
OpenUrl CrossRef

[32] 32.↵
H. Li and R. Durbin. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 25(14):1754–1760, July 2009. ISSN 1367-4803. doi: 10.1093/bioinformatics/btp324. URL 10.1093/bioinformatics/btp324.
OpenUrl CrossRef PubMed Web of Science

[33] 33.↵
H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, and 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics (Oxford, England), 25(16):2078–2079, Aug. 2009. ISSN 1367-4811. doi: 10.1093/bioinformatics/btp352.
OpenUrl CrossRef PubMed Web of Science

[34] 34.↵
J. Lintusaari, H. Vuollekoski, A. Kangasrääasïo, K. Skyten, M. Järvenpää, P. Marttinen, M. U. Gutmann, A. Vehtari, J. Corander, and S. Kaski. ELFI: Engine for Likelihood-Free Inference. Journal of Machine Learning Research, 19(16):1–7, 2018. ISSN 1533-7928. URL http://jmlr.org/papers/v19/17-374.html.
OpenUrl

[35] 35.
J. Ma, A. Ratan, B. J. Raney, B. B. Suh, W. Miller, and D. Haussler. The infinite sites model of genome evolution. Proceedings of the National Academy of Sciences, 105(38):14254–14261, Sept. 2008. doi: 10.1073/pnas.0805217105. URL https://www.pnas.org/doi/10.1073/pnas.0805217105. Publisher: Proceedings of the National Academy of Sciences.
OpenUrl Abstract/FREE Full Text

[36] 36.
P. Maas. Facebook Disaster Maps: Aggregate Insights for Crisis Response & Recovery. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3173–3173, Anchorage AK USA, July 2019. ACM. ISBN 978-1-4503-6201-6. doi: 10.1145/3292500.3340412. URL https://dl.acm.org/doi/10.1145/3292500.3340412.
OpenUrl CrossRef

[37] 37.↵
A. Miles, p. i. Bot, M. R. P. Ralph, N. Harding, R. Pisupati, S. Rae, and T. Millar. cggh/scikit-allel: v1.3.3, May 2021. URL https://zenodo.org/record/4759368.

[38] 38.↵
A. H. Murphy. The Finley Affair: A Signal Event in the History of Forecast Verification. Weather and Forecasting, 11(1):3–20, Mar. 1996. ISSN 1520-0434, 0882-8156. doi: 10.1175/1520-0434(1996)011⟨0003:TFAASE⟩2.0.CO;2. URL https://journals.ametsoc.org/view/journals/wefo/11/1/1520-0434_1996_011_0003_tfaase_2_0_co_2.xml. Publisher: American Meteorological Society Section: Weather and Forecasting.
OpenUrl CrossRef

[39] 39.↵
D. Nelson, J. Kelleher, A. P. Ragsdale, C. Moreau, G. McVean, and S. Gravel. Accounting for long-range correlations in genome-wide simulations of large cohorts. PLOS Genetics, 16(5):e1008619, May 2020. ISSN 1553-7404. doi: 10.1371/journal.pgen.1008619. URL https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008619. Publisher: Public Library of Science.
OpenUrl CrossRef

[40] 40.
C. K. Okoro, R. A. Kingsley, T. R. Connor, S. R. Harris, C. M. Parry, M. N. Al-Mashhadani, S. Kariuki, C. L. Msefula, M. A. Gordon, E. de Pinna, J. Wain, R. S. Heyderman, S. Obaro, P. L. Alonso, I. Mandomando, C. A. MacLennan, M. D. Tapia, M. M. Levine, S. M. Tennant, J. Parkhill, and G. Dougan. Intra-continental spread of human invasive Salmonella Typhimurium pathovariants in sub-Saharan Africa. Nature genetics, 44 (11):1215–1221, Nov. 2012. ISSN 1061-4036. doi: 10.1038/ng.2423. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3491877/.
OpenUrl CrossRef PubMed

[41] 41.↵
J. Pensar, S. Puranen, B. Arnold, N. MacAlasdair, J. Kuronen, G. Tonkin-Hill, M. Pesonen, Y. Xu, A. Sipola, L. Śanchez-Buśo, J. A. Lees, C. Chewapreecha, S. D. Bentley, S. R. Harris, J. Parkhill, N. J. Croucher, and J. Corander. Genome-wide epistasis and co-selection study using mutual information. Nucleic Acids Research, 47(18):e112, Oct. 2019. ISSN 0305-1048. doi: 10.1093/nar/gkz656. URL 10.1093/nar/gkz656.
OpenUrl CrossRef

[42] 42.
B. Quintero Moreno and M. Araque. Molecular characterisation of multidrug-resistant pneumococcal clones colonising healthy children in Merida, Venezuela. Journal of Global Antimicrobial Resistance, 14: 45–50, Sept. 2018. ISSN 2213-7165. doi: 10.1016/j.jgar.2018.02.004. URL https://www.sciencedirect.com/science/article/pii/S2213716518300328.
OpenUrl CrossRef

[43] 43.
A. Rose, J. McKee, K. Sims, E. Bright, A. Reith, and M. Urban. LandScan Global 2019, 2020. URL landscan.ornl.gov. Edition: 2019 Place: Oak Ridge, cTN Section: July 01, 2020.

[44] 44.↵
H. Salje, J. Lessler, I. M. Berry, M. C. Melendrez, T. Endy, S. Kalayanarooj, A. A-Nuegoonpipat, S. Chanama, S. Sangkijporn, C. Klungthong, B. Thaisomboonsuk, A. Nisalak, R. V. Gibbons, S. Iamsirithaworn, L. R. Macareo, I.-K. Yoon, A. Sangarsang, R. G. Jarman, and D. A. T. Cummings. Dengue diversity across spatial and temporal scales: Local structure and the effect of host population size. Science, 355(6331): 1302–1306, Mar. 2017. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.aaj9384. URL https://science.sciencemag.org/content/355/6331/1302. Publisher: American Association for the Advancement of Science Section: Report.
OpenUrl Abstract/FREE Full Text

[45] 45.
R. K. Selander and R. O. Hudson. Animal Population Structure Under Close Inbreeding: The Land Snail Rumina in Southern France. The American Naturalist, 110(974):695–718, July 1976. ISSN 0003-0147. doi: 10.1086/283098. URL https://www.journals.uchicago.edu/doi/abs/ 10.1086/283098. Publisher: The University of Chicago Press.
OpenUrl CrossRef Web of Science

[46] 46.↵
S. A. Sisson, Y. Fan, and M. Beaumont, editors. Handbook of Approximate Bayesian Computation. Chapman and Hall/CRC, New York, Sept. 2018. ISBN 978-1-315-11719-5. doi: 10.1201/9781315117195.
OpenUrl CrossRef

[47] 47.↵
A. Stamatakis. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics (Oxford, England), 30 (9):1312–3, May 2014. ISSN 1367-4811. doi: 10.1093/bioinformatics/btu033. URL http://www.ncbi.nlm.nih.gov/pubmed/24451623. Publisher: Oxford University Press.
OpenUrl CrossRef PubMed Web of Science

[48] 48.↵
G. Tonkin-Hill, N. MacAlasdair, C. Ruis, A. Weimann, G. Horesh, J. A. Lees, R. A. Gladstone, S. Lo, C. Beaudoin, R. A. Floto, S. D. Frost, J. Corander, S. D. Bentley, and J. Parkhill. Producing Polished Prokaryotic Pangenomes with the Panaroo Pipeline. preprint, Genomics, Jan. 2020. URL http://biorxiv.org/lookup/doi/10.1101/2020.01.28.922989.

[49] 49.↵
G. Tonkin-Hill, C. Ling, C. Chaguza, S. J. Salter, P. Hinfonthong, E. Nikolaou, N. Tate, A. Pastusiak, C. Turner, C. Chewapreecha, S. D. W. Frost, J. Corander, N. J. Croucher, P. Turner, and S. D. Bentley. Pneumococcal within-host diversity during colonization, transmission and treatment. Nature Microbiology, 7(11):1791–1804, Nov. 2022. ISSN 2058-5276. doi: 10.1038/s41564-022-01238-1. URL https://www.nature.com/articles/s41564-022-01238-1. Number: 11 Publisher: Nature Publishing Group.
OpenUrl CrossRef

[50] 50.
A. J. van Tonder, J. E. Bray, L. Roalfe, R. White, M. Zancolli, S. J. Quirk, G. Haraldsson, K. A. Jolley, M. C. J. Maiden, S. D. Bentley, Haraldsson, H. Erlendsdottir, K. G. Kristinsson, D. Goldblatt, and A. B. Brueggemann. Genomics Reveals the Worldwide Distribution of Multidrug-Resistant Serotype 6E Pneumococci. Journal of Clinical Microbiology, 53 (7):2271–2285, July 2015. ISSN 0095-1137. doi: 10.1128/JCM.00744-15. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4473186/.
OpenUrl Abstract/FREE Full Text

[51] 51.
D. Wegmann, C. Leuenberger, and L. Excoffier. Efficient Approximate Bayesian Computation Coupled With Markov Chain Monte Carlo Without Likelihood. Genetics, 182(4):1207–1218, Aug. 2009. ISSN 1943-2631. doi: 10.1534/genetics.109.102509. URL 10.1534/genetics.109.102509.
OpenUrl Abstract/FREE Full Text

[52] 52.
B. S. Weir and C. C. Cockerham. Estimating F-Statistics for the Analysis of Population Structure. Evolution, 38(6):1358–1370, 1984. ISSN 0014-3820. doi: 10.2307/2408641. URL https://www.jstor.org/stable/2408641. Publisher: [Society for the Study of Evolution, Wiley].
OpenUrl CrossRef PubMed Web of Science

[53] 53.
S. Wright. The genetical structure of populations. Annals of Eugenics, 15(1):323–354, Jan. 1949. ISSN 2050-1420. doi: 10.1111/j.1469-1809.1949.tb02451.x. URL https://onlinelibrary.wiley.com/doi/abs/ 10.1111/j.1469-1809.1949.tb02451.x. Publisher: John Wiley & Sons, Ltd.
OpenUrl CrossRef Web of Science

Estimating Between Country Migration in Pneumococcal Populations

Abstract

1 Introduction

2 Materials and methods

2.1 Population Divergence Summary Statistics

2.2 Inference Strategy Overview

2.2.1 Simulator

2.2.2 Parameter Inference Algorithm

2.2.3 Model Validation: Recapturing Simulated Data

Two Deme Model

Four Deme Model

2.3 Application to a Pneumococcal Dataset

2.3.1 Isolate Culture and Sequencing

2.3.2 Neutral Gene Selection

2.3.3 Pairwise Distances Between Genes

2.3.4 Controlling Population Structure

GPSC Level

Linked SNP Sites

2.3.5 Fixation Index

2.4 Overall Between-Country Migration Risk

3 Results

3.1 Inference Strategy

3.2 Application to a Pneumococcal Dataset

3.3 Estimating the Weight and Direction of Migration for Country Pairs

3.3.1 Weight

3.3.2 Direction

3.3.3 Demographic Contribution to Migration

3.4 Estimating the Weight of Migration Across Four Countries

3.4.1 Overview

3.4.2 Weights

3.5 Between-country migration probability

3.6 Discussion

4 Data availability

Data Availability

5 Funding

6 Conflicts of interest

A Supplementary Figures

B Supplementary Tables

Acknowledgments

Footnotes

References

Citation Manager Formats

Subject Area