Abstract
The Chikungunya virus, a global arbovirus, is currently causing a major outbreak in the Western African region, with the highest cases reported in Senegal and Burkina Faso. Recent molecular evolution analyses reveal that the strain responsible for the epidemic belongs to the West African genotype, with new mutations potentially impacting viral replication, antigenicity, and host adaptation. Real-time genomic monitoring is needed to track the virus’s spread in new regions. A scalable West African genotype amplicon-based Whole Genome Sequencing for multiple Next Generation Sequencing platforms has been developed to support genomic investigations and identify epidemiological links during the virus’s ongoing spread. This technology will help identify potential threats and support real-time genomic investigations in the ongoing spread of the virus.
1. Background
Chikungunya virus (CHIKV) is one of the most impactful arboviruses worldwide over the past twenty years. The spread of the virus in different continents has favored the emergence of different genotypes classified into West African (WA), East-Central-South-African (ECSA), Asian and Indian Ocean lineages (1).
CHIKV is identified in an increasing number of African countries (2). Given that most reported outbreaks in the continent are due to the ECSA group (3, 4, 5, 6), existing molecular characterization tools are mainly based on this genotype (7). However, most of the outbreaks in West Africa were due to the WA genotype (8). More recently, evidence of distribution area of this genotype has been reported with the virus detection in mosquitoes in China (9). At the time of writing, the Western African region is facing a great CHIKV outbreak with the highest numbers of cases ever reported in Senegal (10, 11) and Burkina Faso (12). Recent molecular evolution analyzes have shown that the strain responsible for the epidemic in Senegal belongs to the WA genotype and highlighted the emergence of new mutations with potential impact on viral replication, antigenicity and host adaptation (11). Moreover, the same study described transient amino acids substitutions. Thus, a real-time genomic monitoring is required following the progressive viral spread into new regions of Senegal (Public Health Direction in Institut Pasteur de Dakar) and potentially beyond through a reliable and accessible genomic characterization tool.
During the COVID-19 pandemic, the development of the amplicon-based Whole Genome Sequencing (WGS) allowed the acceleration of SARS-CoV-2 genomic surveillance in the world, even in the low-and-middle-income countries in Africa (13). In this work, we developed a scalable WA genotype CHIKV amplicon-based WGS for multiple NGS platforms in order to support the genomic investigations and identify epidemiological links during the ongoing spreading process of the virus.
2. Methods
West African genotype primers design for Tiled Amplicon-Based Sequencing Systems
Based on all the available CHIKV sequences within the WA genotype and those from the ongoing Senegalese epidemic obtained using hybrid capture sequencing method (supplementary table S1), 400 bp primer pools (Supplementary table S2 tab 1) were designed using the ‘Primal Scheme’ tool (https://primalscheme.com/). To perform long sequencing reads, primer readjustments were carried out to constitute new pools capable of generating 1.2 kp amplicons with 400 bp of overlap between contiguous amplicons (Supplementary table S2 tab 2).
Samples selection, nucleic acids extraction and RT-qPCR
Human sera from acute CHIKV positive cases collected in 2023 through both a CHIKV outbreak investigation in Kedougou (11) and the nationwide Syndromic Surveillance System in Senegal (14) were included. Sample selection was primarily based on viral load at diagnosis, with low to high CT values as the pattern.
A new extraction was done by the KingFisher Flex purification system (ThermoFisher, Darmstadt, Germany) using the MagMAX Viral/Pathogen Nucleic Acid Isolation Kit (ThermoFisher) following the manufacturer’s protocol. A CHIKV specific RT-qPCR (15) was performed on fresh RNA extracts before the sequencing experiments.
Muliplex-PCR amplifications
Viral RNAs were reverse transcribed into cDNAs using LunaScript RT SuperMix (NEB #E3010) as previously described (16). Briefly, 8 μL template RNA and 2 μL LunaScript RT SuperMix (New England Biolabs #E3010) were mixed and put in the following thermal conditions: 25 °C for 10 min, followed by 50 °C for 10 min and 85 °C for 5 min. The CHIKV viral genome was amplified with either short (400 bp) and long-range (1.2 kp) PCR primers in two reaction pools using viral cDNA in the tiling PCR method. The synthesized cDNAs were template material for direct amplification to generate approximately 400 bp or 1.2 kp amplicons tiled along the genome using two systems of non-overlapping pools of WA-CHIKV targeting primers at 10 μM and Q5® High-Fidelity 2X Master Mix (New England Biolabs#M0494X) with the following thermal cycling protocol: 98 °C for 30 seconds, 35 cycles of: 95°C for 30 seconds, 65°C for 5 minutes, and a cooling step at 4°C.
Illumina Sequencing
Sequencing libraries were synthesized by tagmentation of the 400 bp amplicons using the Illumina DNA Prep kit and the IDT® for Illumina PCR Unique Dual Indexes following manufacturer’s recommendations and as previously described (17). After a cleaning step with the Agencourt AMPure XP beads (Beckman Coulter, Indianapolis, IN), libraries were quantified using a Qubit 3.0 fluorometer (Invitrogen Inc., Waltham, MA, USA) for manual normalization before pooling in the sequencer. Cluster generation and sequencing were conducted in a Illumina iSeq100 instrument with iSeq 100 i1 Reagent v2 (300-cycle).
Oxford Nanopore sequencing
Sequencing libraries preparation was carried out based on both 400 bp and 1.2 kp amplicons using the Oxford Nanopore Rapid Barcoding kit (SQK-RBK110.96) by attaching native barcodes (EXP-NBD114) and motor proteins to DNA molecules using a transposase as previously described (16). 16-plex library pools (for both 400 bp and 1.2 kp sequencing reads) were individually loaded into two different R9.4 flow cells (FLO-MIN106D) on a nanopore GridION sequencer with other parameters kept at their defaults.
Sequencing data analysis
Raw data were collected and analyzed using guppy, raw reads in fastq format for both Illumina and nanopore approach were analyzed using Genome Detective tool (18) accessed on 24 November 2023. Sequences were aligned using MAFFT (19) and the alignment was run under the best model in IQ-TREE (20). The maximum-likelihood (ML) phylogenetic tree was visualized with metadata were performed using ggtree package implemented in R studio.
3. Results
Chikungunya virus-West African genotype primers design for Amplicon-Based Sequencing
A multiplex primer system was designed using a WA-CHIKV reference genome, generating 36 oligonucleotide primer pairs for 400 bp tiled amplicons. The list of primers can be found in supplementary table S2. A long amplicon scheme was obtained by readjusting the first WA- CHIKV for a 1.2 kp long reads sequencing as shown by the supplementary table S2 (tab 1 and tab 2).
Samples panel
Sixteen human sera from CHIKV positive cases in Senegal were included in the study based on viral load at diagnosis, with low to high CT values. RT-qPCR performed on the fresh RNA extracts showed consistent results between the ct values obtained during the first diagnostic and the ones from the new extraction except a sample previously tested positive on the field during an investigation in eastern Senegal that finally came negative in our study. These results are summarized in the table 2.
Short and long amplicon schemes comparison
Library preparations of the sample panel were undertaken as described above and simultaneously for both short and long amplicon schemes. cDNAs obtained by reverse transcription of the RNA extracts were amplified to get 400 bp and 1,200 bp amplicons. DNA concentrations measured by fluorimetry (Qubit) showed similar results between the two schemes (supplementary table S3).
Illumina-based sequencing was performed with the short (400 bp) amplicons while ONT approach was done using both 400 bp and 1.2 kp amplicons, as described above. Regardless of the sequencing technology, all schemes allowed to obtain complete or nearly complete genomes for all CHIKV positive RNAs (figure 1). Indeed, +80% horizontal coverage was systematically observed for all samples with ct value less than 30, and +90% for samples with the highest viral load (ct value ≤ 25). Overall, the ONT approach was quite similar in term of genome coverage than Illumina.
Relationship between the coverage of obtained CHIKV genomes and the pan CHIKV RT-qPCR ct values
Only samples 417824 was not properly sequenced by ONT using the long tiled amplicons while the complete cds was obtained using the short scheme either by Illumina and ONT. The sample 418013 with the lowest viral load (table 1) was poorly covered with a maximum of half the genome obtained by the long amplicon scheme approach on ONT (supplementary figure S1). In general, the number of reads generated by Illumina sequencing was greater than ONT. Among the ONT runs, the long amplicon scheme most often had a higher overall number of reads than the short-PCR t (figure 2).
Number of generated reads per clinical sample for both short and long amplicon scheme in Oxford Nanopore Technology.
However, excepted two or three samples, the number of mapped reads to the CHIKV reference genome was quite similar between both schemes with ONT while the short tiled PCR method allowed to generate more specific data though Illumina. Only the sample that came back negative after re-extraction (sample 427799) did not have reads mapping to CHIKV whatever the sequencing method or the scheme (figure 3). Moreover, sample 427972 gave much less reads from the long amplicon scheme in ONT compared to the two other methods.
Number of mapped reads against used CHIKV reference for the different sequencing platforms and methods
A ML tree was built using sequence data obtained from fifteen clinical samples to evaluate the consistency of different schemes and approaches in terms of genomic information. The analysis showed that in most cases (+93%), sequences generated from the same sample group together regardless of the sequencing method used, the sample with the lowest viral load also being included (figure 4). Only one sample (417986) showed great divergence between the three sequences obtained with the different methods despite the generation of a valuable number of reads mapping with CHIKV for both short and long amplicon schemes in Illumina or ONT sequencers (figure 3).
Phylogenic tree of the sequences obtained from the clinical samples sets by the different amplicon schemes and sequencing methods.
Discussion
In this study, we aimed to develop a scalable CHIKV amplicon-based WGS to fill the gap in existing sequencing tools for monitoring the epidemiological and molecular evolution of CHIKV. Indeed, to our knowledge, most of the available methods are based on the most widespread ECSA genotype.
Although emergence caused by the WA genotype are still localized, we are now seeing increasingly explosive outbreaks such as the one underway in Senegal (11). We designed a short amplicon scheme for WA-CHIKV WGS. The same primers were also used to develop a long amplicon scheme. The validation of the two schemes were done on a panel of clinical samples diagnosed positive to acute CHIKV infection. Both schemes were used for ONT sequencing while Illumina sequencing was performed only with the short tiled PCR method. The validation process’s data demonstrated consistency between short and long amplicon schemes regardless of the sequencing technologies. Near-complete or complete genome sequences were obtained from almost the entire panel of CHIKV-positive clinical samples.
If Illumina approach offered the greatest amount of data, sequencing via the long amplicon scheme has shown slightly better results when we talk about ONT sequencing. Moreover, a good consistency in term of CHIKV sequences was globally observed from one sequencing method to another one, highlighting that sequencing by ONT with either the short or long WA-CHIKV amplicon schemes gave results similar to the gold standard Illumina sequencing by synthesis on short amplicons. Nanopore technology’s sequencing accuracy is often considered a weakness, but it was shown that a 10x sequencing depth cutoff yielded a consensus sequence accuracy of about 99%, similar to Illumina sequences (21). The solutions proposed in this work greatly exceed this depth cutoff.
Thus, in addition to Illumina-based sequencing, we describe systems adapted to ONT for equally reliable results. The ONT sequencers are portable, cost-effective and highly scalable NGS platforms that streamline sequencing workflow and reduces turnaround times. This work provides versatile WA-CHIKV WGS schemes of interest for real-time molecular epidemiology and evolutionary studies.
Data Availability
All data produced in the present study are available upon reasonable request to the authors
Funding
This work was funded by the NIH West African Center for Emerging Infectious Diseases (grant number U01AI151801-01) and the Africa CDC Pathogen Genomics Initiative funds (CARES grant 4306-22-EIPHLSS-GENOMICS).
Informed consent
Clinical samples in this study were collected on CHIKV positive human samples according to the guidelines of the Declaration of Helsinki, and approved by the Senegalese national ethics committee for health research (protocol code SEN20/08 approved on 6 April 2020).
Conflicts of Interest
The authors declare no conflict of interest.
Acknowledgments
We thank the Collaborating Center for Arboviruses and Hemorrhagic Fever viruses in the Virology Department of Institut Pasteur de Dakar for their critical role in the diagnostic process.