Abstract
The relevant role of LINE-1 (L1) retrotransposition in cancer has been recurrently demonstrated in recent years. However, their repetitive nature hampers their identification and detection, hence remaining inaccessible for clinical practice. Also, its clinical relevance for cancer patients is still limited. Here, we develop a new method to quantify L1 activation, called RetroTest, based on targeted sequencing and a sophisticated bioinformatic pipeline, allowing its application in tumor biopsies. First, we performed the benchmarking of the method and confirmed its high specificity and reliability. Then, we unravel the L1 activation in HNSCC according to a more extensive cohort including all the HNSCC tumor stages. Our results confirm that RetroTest is remarkably efficient for L1 detection in tumor biopsies, reaching a high sensitivity and specificity. In addition, L1 retrotransposition estimation reveals a surprisingly early activation in HNSCC progression, contrary to its classical association with advanced tumor stages. This early activation together with the genomic mutational profiling of normal adjacent tissues supports field cancerization process in this tumor. These results underline the importance of estimating L1 retrotransposition in clinical practice towards an earlier and more efficient diagnosis in HNSCC.
RetroTest represents the first method to determine LINE-1 retrotransposition from tumor biopsies in real clinical settings.
RetroTest not only offers global LINE-1 retrotransposition ratios but also identifies the active LINE-1 source elements.
RetroTest elucidates a really early LINE-1 activation in early tumor stages of Head and Neck Squamous Cell Carcinoma.
Whole Genome Analysis and LINE-1 retrotransposition demonstrates processes of field cancerization in Head and Neck Squamous Cell Carcinoma.
LINE-1 retrotransposition favors an earlier and more efficient Head and Neck Squamous Cell Carcinoma diagnosis.
1. Introduction
Approximately half of human genome is composed of transposable elements (TEs), sequences with the ability of moving from one place to another, changing the normal structure of the genome in the places where they are integrated [1,2]. Among them, long interspersed nuclear element retrotransposons (LINE-1, L1) represent 17% of the entire DNA content with approximately 500,000 copies, most of them truncated or inactive [3–6]. Specifically, only a small subset of these L1s is active in the human genome, although they stay transcriptionally repressed due to epigenetic mechanisms that prevent the damage that their mobilization would cause [7]. When this repression is lost, L1 activation can cause different diseases, including cancer [8, 9].
In the framework of the International Consortium of PanCancer (PCAWG), our previous analyses shown that somatic L1 insertions represent the major restructuring source of cancer genomes, especially important for head-and-neck squamous cell carcinoma (HNSCC)[9]. Aberrant L1 retrotransposition contribute to instability within cancer genomes, promoting cancer-driving rearrangements that involve the loss of tumor-suppressor genes and/or the amplification of oncogenes, and favoring some cancer clones to survive and grow [9].
Despite this demonstrated impact of L1 activation in cancer genomes, the repetitive nature of L1 elements and their dispersion along the genome hinder their real activation estimates in tumor samples, preventing its translation into patient diagnosis/prognosis. This repetitive nature compels that the majority of the available detection methods are based on Whole Genome Sequencing (WGS), unfeasible for most hospitals in their diary practice, or require such good quality DNA input quantities that remain unaffordable for a clinical practice mainly based on small and degraded tissue biopsies.
HNSCC represents the second tumor type with highest L1 activation [9], but the clinical implications of L1 retrotransposition are still to be elucidated. Head and Neck Cancer is a heterogeneous group of cancers of which more than 90% are diagnosed as HNSCC, arising in the stratified epithelium of the oral cavity, pharynx, and larynx [10,11]. Tumor development is often triggered by chronic exposure to tobacco or alcohol, while infection with high-risk human papillomaviruses (HPVs) causes a substantial and rising proportion of these tumors [12]. The lack of symptoms in the early stages together with the non-existent cancer biomarkers lead towards most diagnoses at advanced stages, where the 5-year survival rate is less than 50% [11]. Thus, there is an urgent need to find molecular biomarkers that can facilitate an earlier diagnosis and increase patientś life expectancy.
Based on this, we aimed to unravel the impact of possible L1 activation along HNSCC development in a clinical setting. First, we have developed a new efficient method based on L1 transductions, RetroTest, to measure L1 activation even in biopsies with low DNA inputs. Then, we have evaluated RetroTest sensitivity and specificity, demonstrating its high values. Finally, we have assessed L1 retrotransposition along different tumor stages, unraveling for the fist time a really early activation in HNSCC and field characterization, arising L1 as a promising early diagnostic biomarker.
2. Materials and methods
2.1. Patients and tumor samples
Tumor samples and medical records were analyzed in a series of 96 HNSCC patients of 3 different cohorts (Complexo Hospitalario Universitario de A Coruña (CHUAC), Biobanco Vasco and Fundación Pública Galega de Medicina Xenómica (FPGMX)). The Ethical Committee for Clinical Research of Santiago-Lugo approved the study (CEIC 2018/567). Fresh-frozen and FFPE HNSCC tissue biopsies were collected, and characteristics of each tumor are specified in Table 1. The histopathologic status was confirmed by each corresponding Pathology Department following the latest TNM guidelines.
2.2. DNA isolation
Genomic DNA (gDNA) was extracted from fresh-frozen tissue and formalin-fixed paraffin-embedded (FFPE) samples using AllPrep DNA/RNA and AllPrep FFPE DNA/RNA Mini Kits (Qiagen), respectively. DNA quantification and integrity were assessed using Qubit dsDNA BR Assay Kit in Qubit 4.0 (ThermoFisher Scientific) and a 4200 TapeStation system (Agilent).
2.3. RetroTest method: design, library construction and target sequencing
During L1 transcription, the processing machinery sometimes bypasses the L1 polyadenylation signal until a second 3’ downstream polyadenylation site, mobilizing unique sequences downstream of the element in a process called L1 3’ transduction. This process has been reported to occur in around 10% of the L1 mobilizations [7] and can be used as an indirect measurement of real active L1 elements. Accordingly, there are three different types of retrotranspositions: solo-L1 (TD0), when a partial or complete L1 is retrotransposed; partnered transductions (TD1), in which a L1 and downstream unique sequence are retrotransposed; and orphan transductions (TD2), in which only the unique sequence downstream of the active L1 is mobilized without the associated L1 [7]. RetroTest is based on targeted sequencing, where the capture probes are designed against the unique sequence downstream of the 124 L1 full-length competent elements previously described [7] using SureSelect design from Agilent. RetroTest identifies L1 TD2 orphan insertions through the detection of discordant reads and clipped reads. The main idea behind the method is that discordant read pairs, where one of the mates is mapped to a L1 3’ downstream sequence while the other is mapped to the insertion target sequence, support an insertion. In addition, clipped reads, mapped to the target sequence but containing a discordant extreme blatting to a L1 3’ downstream sequence, are detected to identify the breakpoint of the insertion (Fig. 1A). Here, a minimum size of 2 reads per cluster and a minimum of 4 supporting reads was specified to call a L1 insertion (Additional file 1). Target libraries were constructed with SureSelect Target Enrichment System for Illumina Paired-End Multiplexed Sequencing (Agilent). 100 ng of gDNA from each sample were sheared using a Covaris M220 Focused-Ultrasonicator (Covaris) and libraries and capture with targeted RNA baits were performed. The multiplexed samples were sequenced with Illumina 150bp paired-end.
A. RetroTest design. B. Scatterplot and correlation between L1 activation and orphan transductions (TD2) in the International Consortium of PanCancer (PCAWG) data. The number of L1 3’ transductions measured L1 activation. C. Performance of RetroTest for different sequencing coverages using an artificially generated GRC37/hg19 genome with a total of 2,480 randomly distributed L1 transductions at 50% VAF. D. Performance of RetroTest with respect to the VAF of L1 integrations, using the artificially generated GRC37/hg19 genome with a total of 2,480 randomly distributed L1 transductions at different VAFs. E. Performance of TraFiC with respect of the VAF of L1 integrations, using the artificially generated GRC37/hg19 genome with a total of 2,480 randomly distributed L1 transductions at different VAFs. F. Venn diagram of the number of L1 insertions detected by RetroTest and TraFiC in the artificial genome with a VAF of 50% for L1 insertions.
Sequencing reads were mapped to the hg19 reference genome by Burrows-Wheeler Aligner BWA-mem [13]. Samtools [14] was used to sort the aligned reads and to index the obtained bam file, applying Bammarkduplicates2 from Picard tools [15] to mark duplicated reads.
2.4. RetroTest benchmarking
We generated simulated paired-end read datasets using ART [16] and several commands from MEIGA-MEIsimulator, an in-house bioinformatic tool (Additional file 1).
To mimic the capture process, we selected read pairs with at least one of the mates mapping on a given set of target regions with Picard FilterSamReads v2.18.14 [15]. We used the resulting bam files to assess sensitivity and specificity under different conditions. To test how our method performs with subclonal events, we simulated transductions at different VAFs (10%, 20%, 40% and 50%). We also studied how coverage affects the detection power of our algorithm. Using Picard DownSampleSam v2.18.14 [15], we subsampled reads from a 150x simulation at 50% VAF under different sequencing depths (15x, 30x, 60x, 90x, 120x and 150x). We compared RetroTest performance with Transposon Finder in Cancer (TraFiC), used previously to explore somatic retrotransposition in PCAWG [7]. Both were used to call the simulated events. Precision was calculated dividing the number of true positive calls by the number of total calls performed by the method. Recall was calculated dividing the number of true positive calls by the number of total simulated events. True positives, False positives and False negatives were identified by intersecting the coordinates of simulated events with the coordinates of the calls using BEDTools intersect [17]. To compare RetroTest and TraFiC results, Venn diagrams with the insertions detected by each method were plotted by using vennDiagram R package.
2.5. Whole Genome Sequencing and determination of mutation profile
To obtain WGS data, DNA was sent to an external service (Macrogen). Truseq Nano DNA Libraries (350bp) were constructed and sequenced in a NovaSeq6000 Illumina platform (150bp paired-end). For a detailed explanation see Additional File 1.
2.6. Statistical analyses
The association between L1 transduction rate (corrected by coverage) and patient clinical features was assessed by multiple linear regressions. Wilcoxon or Fisher test, depending on sample size, were applied to compare the differences in mean values for clinical variables. Overall survival (OS), progression free survival (PFS) and survival probability analyses were performed with the survminer and survival R packages, using log-rank test to compare different groups. The association between survival and clinical variables was evaluated by Cox regression.
2.7. Enrichment analysis
All enrichment analyses were performed using enrichR R package [18]. The complete list of pathways databases can be found in Additional File 1.
3. Results
3.1. RetroTest Benchmarking
During L1 transcription, transductions are reported to occur in around 10% of the L1 mobilizations [7], moving also unique sequences downstream of the source element. RetroTest is designed to capture these mobilized and downstream-transduced unique sequences from orphan transductions (TD2) and use them as barcodes (Fig. 1A). These barcodes can identify unequivocally the insertions caused by these 124 L1 source elements active in cancer [7,9]. With this aim, we focused RetroTest probe design on the first 5000 nucleotides adjacent to the L1 3’ regions for each of the 124 L1 source elements, since these are the regions most frequently transduced [7]. Since RetroTest uses transductions as an indirect measure of real L1 activation, we first compared L1 activation versus TD2 using the whole PCAWG data. In this way, we confirmed a high and statistically significant correlation between both events (r=0.88, p=2.2e-16), and a relation of 1:10 as previously described (Fig. 1B).
Next, we optimized the lab protocol for both FFPE and fresh-frozen tissues and developed an associated bioinformatic pipeline. We evaluated the performance and accuracy of RetroTest in detecting L1 activation by generating an artificial cancer genome. In this genome, we have randomly distributed a total of 2480 L1 transductions. Using simulations, we assessed the performance of RetroTest for different sequencing depth: 15x, 30x, 60x, 90x, 120x and 150x. RetroTest obtained a precision around 0.99 in all cases and a recall around 0.96 (Fig. 1C). We also evaluated the variation of the performance depending on the variant allele frequency (VAF) of the integrations, precisely for the following VAFs: 10%, 20%, 40% and 50%. In this case, RetroTest obtained a precision around 1, decreasing for lower VAFs, and a recall ranged from 0.81 to 0.96, augmenting as increasing the VAF (Fig. 1D).
Then, we compared RetroTest and the classical TraFiC, used for WGS in PCAWG [7]. As TraFiC is designed to work only with standard WGS 30x data, we compared the analysis varying exclusively the VAFs. TraFiC precision ranged from 0.99 to 0.88 and recall ranged from 0.11 to 0.87 as VAF increased (although the maximum recall was obtained for VAFs of 40%, being 0.87) (Fig. 1E) (Additional File 2: Table S1). We compared the performance of both methods by intersecting both calls in the mock tumor genome. Using a VAF of 50%, most of the variants were called by both methods, concretely 2216 variants, while RetroTest exclusively called 167, and TraFiC 33 private events, most of them resulting in false positives according to IGV (Fig. 1F).
3.2. L1 activation in HNSCC
Once confirmed the accuracy and precision of RetroTest, we decided to apply it to a more extensive HNSCC cohort, composed of 96 tumors along the tumor stages, from T1 to T4 (Table 1). We detected L1 activation in 75% of the patients (Fig. 2A), out of which 48.6% showed high activity (beyond the median) (Table 2). When the activation was studied along the different tumor stages of the disease, advanced disease (T3-T4) showed statistically higher L1 activation compared with early stages (T1-T2) (p=0.0072) (Fig. 2B). Following tumor staging, L1 activation was detected in 62.5% of T1 tumors, 63.1% of T2 tumors, 94.4% of T3 tumors, and 73.5% of the tumors in T4 (Table 2). The detection of L1 activation in all the tumor stages among the different patients, even in the first stage of the disease (T1), supports an early L1 activation during tumor progression.
A. Quantification of L1 activation in HNSCC tumors as the number of orphan transduction detected by RetroTest. B. Boxplot of L1 activation with respect to early (T1-T2) and advanced (T3-T4) TNM stages. Differential activation p-value was derived by the Wilcoxon test. To correct coverage-related bias, L1 activation was calculated as the number of TD2 divided by its median coverage. C. Boxplot of L1 activation with respect to smoking status. Differential activation p-value was derived by the Wilcoxon test. To correct coverage-related bias, L1 activation was calculated as the number of TD2 divided by its median coverage. D. Kaplan-Meier curves for overall survival with respect to L1 activation rate at T1 stage. Patients were grouped into high (above the median) and low (below the median) L1 activation rate. Log-rank test was used to calculate the p-value. To correct coverage-related bias, L1 activation was calculated as the number of TD2 divided by its median coverage. E. Oncoplot showing the genes affected by L1 insertions and their original source elements. A total of 27 patients presented genes affected by L1 insertions from 46 different source elements.
Then, we analyzed L1 activation considering different patients’ clinical characteristics, finding no statistically significant association between the L1 activation and alcohol consumption (p=0.14) or sex (p=0.055). Interestingly, smoker patients showed statistically significantly higher L1 activity than non-smokers (p=0.0015) (Fig. 2C). We did not find an association between L1 activation and survival probability (p=0.59 active vs. inactive, p= 0.49 high vs. low rate) (Additional File 3: Fig. S1), although when considering those T1 patients who have L1 already active at early stages, Kaplan-Meier curve showed that they tend to have a lower survival probability, not being statistically significant (p=0.43) (Fig. 2D).
Finally, in addition to localizing the position of each transduction, RetroTest can also identify the source L1 element that has been mobilized. Thus, we have detected L1 transductions in 23 genes throughout 27 patients. As shown in Fig. 2E, most of these insertions are caused by a few source elements that resulted very active in HNSCC, especially the 22q12.1.
3.3. HNSCC mutation profile and L1 activation
To further characterize the molecular profile of L1 activation in our HNSCC cohort, we obtained WGS data from 19 tumor samples from patients measured by RetroTest. To detect only somatic variation, their corresponding paired-normal samples were included as germline control. We detected a median of 40 single nucleotide variants (SNVs) and INDELs, identifying a total of 1012 somatic variants throughout the cohort, affecting 918 genes (Additional File 2: Table S2). As shown in Fig. 3A, we did not detect a correlation between L1 activation and the general tumor mutation burden (TMB). Our results showed that the most frequently mutated gene was TP53 (36.8%), followed by NOTCH1 (26.3%), MT-ND5 (26.3%), FAT1, and GRIN2A (21.1%). Interestingly, we found that most of the patients with TP53 mutations showed also high L1 activity (71.4%). In fact, when we compared the L1 activation with the TP53 mutation, we found a clear association (p=0.056) (Fig. 3B). Enrichment analyses with the mutated genes showed involved key processes for cancer progression such as Notch signaling, TGFß signaling, and again p53 activity regulation (Fig. 3C) (Additional File 2: Table S3). CHEA analysis revealed the alteration of transcription factors related to epigenetic mechanisms including Polycomb members (EZH2, SUZ12) (Fig. 3D) (Additional File 2: Table S4).
A. Oncoplot showing the genes harboring somatic mutations in HNSCC patients. The type of mutation in each gene and the L1 activation (number of L1 insertions in each patient) is shown. B. Boxplot of L1 activation with respect to TP53 mutation or wild-type status. To correct coverage-related bias, L1 activation was calculated as the number of TD2 divided by its median coverage. Differential activation p-value was derived by the Wilcoxon test. C. Barplot of the Pathway enrichment analysis based on the 918 genes somatically mutated genes. Enrichment p-values were calculated with the Fisher exact test. D. Barplot of the transcription factor binding enrichment analysis based on the 918 genes somatically mutated genes. Enrichment p-values were calculated with the Fisher exact test.
3.4. Profiling normal adjacent tissues to the HNSCC tumor (NAT)
To further understand the early HNSCC features, we decided to evaluate a possible field cancerization process, in which the normal cell population is replaced by cancer-primed cells, without anatomical or morphological changes, but being already premalignant at molecular level. To this, we analyzed the available normal samples obtained from adjacent tissues to the tumor and the peripheral blood mononuclear cells (PBMCs), used as germline control (Fig. 4A). We detected a total of 25 high impact and/or possibly pathogenic somatic variants affecting the NAT (Additional File 2: Table S5). Most of these specific mutations (n=20; 80%) were exclusive from NAT tissue, including those affecting key genes such as NOTCH1 (the most mutated gene), FAT1 or PPARD; while 20% were shared between NAT and tumor tissue, affecting genes such as CDKN2A.
A. Schematic representation of the evaluation of the field cancerization process. B. Number of L1 active elements in normal adjacent tissue of HNSCC patients (n=9), compared to L1 activation in paired tumor tissue and PBMCs as germline control from the same patients.
To elucidate whether L1 further supports field cancerization, we evaluated the NAT of a total of 9 patients by RetroTest. We compared the L1 elements active in the tumor, in the NAT, and in their corresponding paired germline. In this way, we could confirm that most of the L1 activation was present only in the tumor, 5 insertions resulted germline, and 1 element appeared active and shared by tumor and NAT. Surprisingly, 4 insertions appeared exclusively in NAT (Fig. 4B). Thus, we could confirm field cancerization and demonstrate that L1 is already active in NAT, supporting once again its early activation in HNSCC.
4. Discussion
Since the recently demonstrated high impact of L1 in cancer genomes [7,9], L1 has been evaluated as a cancer biomarker in different studies assessing its activity by evaluating L1 methylation, RNA expression, or protein levels [19–22]. However, most of L1 genome sequences are truncated and not functional, so results can present important biases in their L1 estimations, while its translation into clinical routine can be challenging due to RNA/protein instability. Previous proposed technologies based on DNA full-length L1 retrotransposon capture sequencing, such as RC-seq [23], present the associated possible bias of the real L1 activation, besides only offer global L1 estimates, and important DNA requirements (2.5μg starting genomic DNA), unaffordable for clinical practice. The most recent approaches are based on the identification of L1 insertions as real L1 activation measure (TraFiC [7], xTea [24], MELT [25] and Mobster [26]). They use Whole Genome or Whole Exome Sequencing from Illumina pair-end short reads, nevertheless, short reads hamper the detection of transposable elements insertions in highly repetitive or complex rearrangement regions. Additionally, these high throughput approaches are not affordable for most of the hospitals. More recently, long reads sequencing has arisen as a new possibility (xTea [24], PALMER [27]), but again their requirements of huge amounts of high-quality DNA remain unaffordable for biopsies mainly composed of small samples and fragmented DNA. Thus, incorporating L1 activation into the clinical practice requires new methods supported by standardization and rigorous validation to demonstrate its utility.
Here, we present RetroTest: a new method to detect real L1 activation in clinical samples with low DNA input requirements, both from fresh/frozen or FFPE biopsies. Its novelty and power lie in the targeted detection of active source L1 elements, which showed a highly and direct statistical correlation with total L1 activity (as demonstrated by our data) in a more cost-effective than previously proposed approaches. Our method not only offers global L1 estimates but also identifies the L1 elements active in each real sample. Our benchmarking supports the high precision (1) and recall of RetroTest (0.81-0.96), improving according to coverage, even reaching the possibility of detecting subclonal insertions. The high coincidence in the variants called by TraFic and RetroTest, in both simulations and real-world data, supports the high potential for this new methodology.
L1 somatic retrotransposition was previously demonstrated as the second most frequent type of structural variants among HNSCC genomes [9]. Accordingly, our results demonstrate that most HNSCC patients (75%) present L1 activation, even near half of them with high levels (49%). The activation was higher in late stages, as found in Barret’s esophagus, where lower L1 activity was detected in the early stages increasing with cancer progression [28]. Interestingly, we found a surprisingly early activation already in the first stages of HNSCC, with activation of L1 in 62-63% of the T1 patients. These data pointed towards L1 activation as an early event in the configuration of HNSCC genomes and, thus, in the development of the disease. In fact, those stage T1 patients with L1 activation tended to present a lower OS.
We found an association between high L1 activity and smoking habits. Previous studies had reported higher hypomethylation rates of L1 in smokers [29], even in non-cancerous epithelial tissues [30]. Considering that around 75% of HNSCC are associated with tobacco [31], this mechanism could be responsible for the high L1 activity levels in HNSCC, since methylation is one of the best-demonstrated mechanisms preventing L1 reactivation [7,32]. In fact, L1 hypomethylation has been associated with worse prognosis in HNSCC, including a higher risk of relapse [33–35]. This hypomethylation has been found as an early event in CRC, gastric and oral cancer [36–39]. Interestingly, it was reported that L1 methylation levels were significantly lower in oral premalignant lesions of patients who then developed oral cancer [39].
The source element L1 most active in our cohort was that at 22q12.1, coincident with our previous results [7, 10]. This L1 is located antisense to an intron of the TTC28 gene and has been also recently identified as the intact LINE-1 mRNA most highly expressed in breast, ovarian and colon cancer [40], and the L1 element accounting for most transductions in colorrectal cancer [33], supporting the same hottest activity in HNSCC.
Our WGS analyses confirmed no correlation between L1 activity and TMB, but an association between TP53 mutation and L1 activation, in line with previous studies suggesting that TP53 can repress L1 mobilization [9,40–43]. We also identified an important epigenetic regulation among the transcription factors strongly interacting with the mutated genes, specially related to repressive complex Polycomb. Intriguingly, Mangoni et al. have just deciphered that L1 RNAs can act as long non-coding RNAs and directly interact with the Polycomb during brain development and evolution [44]. We also reported previously that Polycomb could regulate lncRNA HOTAIR in bladder cancer [45]. In the same sense, Ishak et al. demonstrated an EZH2-dependent silencing of genomic repeat sequences, including L1 elements [46]. Therefore, additional analyses would be required to further address if this epigenetic network plays a role in cancer genome reorganization.
Then, to evaluate possible HNSCC early diagnosis potential biomarkers is fundamental to get insights into the transition from pre-tumoral to cancer disease. Thus, we evaluated the presence of field cancerization and identified genetic features that can eventually lead to cancer development [47], finding exclusive somatic mutations in the NAT and a small proportion shared with the tumor. Several studies have recently described patchworks of different clones in normal tissues, some of them even bearing driver mutations, increasing with age or smoking habits [48–50]. We found NOTCH1 and FAT1 as the most mutated genes, as Martincorenás results in normal skin and esophagus [48,51]. These genes showed also high mutation rates in the tumor, indicating the presence of a precancerous or cancer invasion field.
Finally, when we evaluated L1 activity, several mobilizations were exclusively found in NAT, and only one appeared shared with the tumor, again supporting a cancerization field. Somatic L1 retrotransposition events have been recently described in normal urothelium and colorectal epithelium, although with much lower rates than in bladder and colorectal cancer [52,53]. L1 activity has been also described in a few pre-tumor samples including Barrett’s esophagus and colorectal adenomas [8,28,54]. Therefore, L1 activation arises as a reinforced early event, even in pre-tumor stages, in the natural history of HNSCC, with the associated potential of L1 as a field cancerization biomarker.
Conclusions
We present the development and benchmarking of RetroTest, which allows an easy measurement of real L1 activation from small tumor biopsies with high efficacy, favoring its implementation in real clinical settings. RetroTest revealed that most of the HNSCC patients present L1 activation, associated with smoking habits and already active in early stages of the disease and NAT, supporting a field cancerization process and its potential as early diagnostic biomarker.
Data Availability
Whole Genome Sequencing data from the HNSCC cohort is deposited into the Sequence Read Archive (SRA) repository under the following BioProject ID: PRJNA1053897. RetroTest pipeline is publicly available at https://gitlab.com/mobilegenomesgroup/RETROTEST.
Ethics statement
The Ethical Committee for Clinical Research of Santiago-Lugo gave ethical approval for this work (CEIC 2018/567).
Competing interests
The authors declare no competing interests.
Consent for publication
All authors give consent for the publication of this manuscript.
Data availability statement
Whole Genome Sequencing data from the HNSCC cohort is deposited into the Sequence Read Archive (SRA) repository under the following BioProject ID: PRJNA1053897. RetroTest pipeline is publicly available at https://gitlab.com/mobilegenomesgroup/RETROTEST.
CRediT authorship contribution statement
Jenifer Brea-Iglesias and Ana Oitabén: Conceptualization, Methodology, Software, Data curation, Investigation, Validation, Visualization, Writing – review and editing. Sonia Zumalave: Conceptualization, Investigation, Methodology, Software, Validation, Writing – review and editing. Bernardo Rodríguez-Martin: Conceptualization, Methodology, Software. María Gallardo-Gómez: Conceptualization, Visualization, Writing – review and editing. Martin Santamarina: Methodology, Validation. Ana Pequeño-Valtierra: Conceptualization, Investigation. Laura Juaneda-Magdalena: Methodology, Conceptualization, Writing-review and editing. Ramón García-Escudero: Methodology, Writing-review and editing. Jose Luis López-Cedrún: Methodology. Máximo Fraga: Funding acquisition. José MC Tubio: Conceptualization, Funding acquisition. Mónica Martínez-Fernández: Writing – review & editing, Writing – original draft, Supervision, Resources, Investigation, Project administration, Funding acquisition, Conceptualization. All authors approved the final version.
ADDITIONAL FILES
Additional file 1 (pdf): Supplementary Methods: RetroTest library and target sequencing; RetroTest method in detail; RetroTest benchmarking; Whole Genome Sequencing and determination of mutation profile; Gene pathway databases uses for enrichment analysis
Additional file 2 (xls): Table S1: RetroTest benchmarking. Table S2: HNSCC somatic variants. Table S3: Gene pathways affected by HNSCC somatic variants. Table S4: Transcription factors binding to genes affected by HNSCC somatic variants. Table S5: HNSCC NAT somatic variants
Additional file 3 (pdf). Figure S1. Kaplan-Meier curves for overall survival in HNSCC cohort with respect to (A) L1 activation status (active vs inactive) and (B) L1 activation rate (with respect to the median, being high above vs low as below the median). Log-rank test was used to calculate the p-value.
Acknowledgements
Authors thank all the enrolled patients and their families. This research project was made possible through the access granted by the Galician Supercomputing Center (CESGA) to its supercomputing infrastructure. The supercomputer FinisTerrae III and its permanent data storage system have been funded by the Spanish Ministry of Science and Innovation, the Galician Government and the European Regional Development Fund (ERDF).
This work was supported by the Instituto de Salud Carlos III (ISCIII) and the European Social Fund (“Investing in your future”) (PI19/01113, and partially by P121/00208, co-funded by FEDER and the European Union), and the Spanish Association Against Cancer Scientific Foundation (IDEAS19122MART). A.O. was supported by a predoctoral fellowship from the Galician Innovation Agency, Xunta de Galicia (ED481A-2020/214). M.M.-F. and J.B. were previously supported by the Spanish Association Against Cancer Scientific Foundation (INVES207MART and PRDCR19007BREA_001, respectively). M.M.-F. is currently supported by the Miguel Servet program (CP20/00188) from the Instituto de Salud Carlos III (ISCIII) and the European Social Fund (“Investing in your future”). M.G-G. is supported by a postdoctoral fellowship from the Galician Innovation Agency, Xunta de Galicia (IN606B-2024/014).
Footnotes
The authorship credict was not correct: J. Brea-Iglesias and A. Oitaben are two first co-authors who contributed equally to this work. This is now noted as affiliation number "3. Equal contribution". The main text remains unaltered.
References
- [1].↵
- [2].↵
- [3].↵
- [4].
- [5].
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].
- [21].
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].
- [35].↵
- [36].↵
- [37].
- [38].
- [39].↵
- [40].↵
- [41].
- [42].
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵