Blood RNA and protein biomarkers are associated with vaping and dual use, and prospective health outcomes ========================================================================================================= * Andrew Gregory * Zhonghui Xu * Katherine Pratte * Seth Berman * Robin Lu * Rahul Suryadevara * Robert Chase * Jeong H. Yun * Aabida Saferali * Craig P. Hersh * Edwin K. Silverman * Russell P. Bowler * Laura E. Crotty Alexander * Adel Boueiz * Peter J. Castaldi ## Abstract **Background** Electronic nicotine delivery systems (ENDS) are driving an epidemic of vaping. Identifying biomarkers of vaping and dual use (concurrent vaping and smoking) will facilitate studies of the health effects of vaping. To identify putative biomarkers of vaping and dual use, we performed association analysis in an observational cohort of 3,892 COPDGene study participants with blood transcriptomics and/or plasma proteomics data and self-reported current vaping and smoking behavior. **Methods** Biomarkers of vaping and dual use were identified through differential expression analysis and related to prospective health events over six years of follow-up. To assess the predictive accuracy of multi-biomarker panels, we constructed predictive models for vaping and smoking categories and prospective health outcomes. **Results** We identified three transcriptomic and three proteomic associations with vaping, and 90 transcriptomic and 100 proteomic associations to dual use. Many of these vaping or dual use biomarkers were significantly associated with prospective health outcomes, such as FEV1 decline (three transcripts and 62 proteins), overall mortality (18 transcripts and 73 proteins), respiratory mortality (two transcripts and 23 proteins), respiratory exacerbations (13 proteins) and incident cardiovascular disease (24 proteins). Multimarker models showed good performance discriminating between vaping and smoking behavior and produced informative, modestly powerful predictions of future FEV1 decline, mortality, and respiratory exacerbations. **Conclusions** In summary, vaping and dual use are associated with RNA and protein blood-based biomarkers that are also associated with adverse health outcomes. Keywords * biomarkers * vaping * ENDS * e-cigarettes * smoking * genomics ## Introduction The high prevalence of electronic cigarette use (vaping) among both adolescents[1] and adult cigarette users[2] is a major public health concern. Electronic cigarettes, also referred to as electronic nicotine delivery systems (ENDS), were initially marketed as a means of smoking reduction, with the hope that their effects on the health of cigarette users would be a net positive. However, vaping is associated with acute respiratory symptoms[3–5] and lung injury[6,7] and is likely to be associated with long-term adverse health effects. Vaping companies are targeting young people as a profitable demographic and promoting ENDS use beyond the population of established cigarette users, resulting in an ongoing epidemic of vaping in adolescents and non-cigarette user adults that is clearly detrimental to public health[8–10]. The Family Smoking Prevention and Tobacco Control Act of 2009 gave the Food and Drug Administration (FDA) regulatory oversight over tobacco products which, by the FDA Deeming Regulation of 2016, includes ENDS. Thus, the FDA is responsible for reviewing the safety of all ENDS products introduced to market after February 15, 2007. To facilitate this review process, it is important to determine whether there are validated biomarkers of ENDS use and combined use of ENDS and cigarettes (dual use). While biomarkers of vaping and dual use have been reported, few if any vaping biomarkers have been assessed for their association to prospective health outcomes, and the predictive performance of multi-biomarker panels has not been assessed. In this paper, the term biomarker refers to molecular measurements tested for association to vaping or smoking exposure, which we distinguish from validated biomarkers in which a specific association has been demonstrated in multiple studies. Validated biomarkers that are also predictive of prospective health events can be considered for use as surrogate outcomes in safety and health assessments[11,12] which would be a useful tool for evaluating the health effects of an increasingly diverse array of ENDS products. Thus, the goal of this study is to establish a set of candidate biomarkers for future validation and further study as potential surrogate outcomes. We hypothesized that vaping and dual use are associated to multi-omic biomarkers, and that a subset of these biomarkers would be associated with the development of pulmonary and cardiovascular disease. We conducted high throughput transcriptomic and proteomic analyses that identified over 100 biomarkers associated with vaping and dual use in a cohort of adult current and former cigarette users in the COPDGene study. We tested these biomarkers for association with prospective spirometric changes, mortality, and incident cardiovascular disease, identifying dozens of significantly associated biomarkers. We also investigated the extent to which multimarker panels can predict vaping and smoking status and the development of prospective health outcomes. ## Methods ### Study description The COPDGene study recruited 10,198 non-Hispanic White and Black cigarette users with at least 10 pack-years of lifetime cigarette smoking history at 21 U.S. clinical centers ([NCT00608764](http://medrxiv.org/lookup/external-ref?link_type=CLINTRIALGOV&access_num=NCT00608764&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F05%2F2022.09.19.22280093.atom), [www.copdgene.org](http://www.copdgene.org))[13] between 2007 and 2011. COPDGene obtained five-year follow-up and is currently obtaining 10-year follow-up of available subjects to collect longitudinal data on study participants. A total of 6,756 subjects completed their five-year study visit (Visit 2) and the 10-year study visit (Visit 3) is ongoing at time of writing. Data collected includes questionnaires, spirometry, and self-reported smoking and vaping behaviors. At Visit 2, whole-blood RNA sequencing (RNA-seq), plasma protein measurements (SOMAscan version 4.0 (5.0K) assay for human plasma), and plasma cotinine were obtained from a subset of subjects. Throughout the study, subjects were also regularly contacted via the phone or Internet as part of the COPDGene Longitudinal Follow-up (LFU) program to assess interval events, including respiratory exacerbations, incident comorbidities, and mortality. Definitions of respiratory exacerbations, incident comorbidities, and mortality assessment are included in the supplementary materials of the online preprint. Written informed consents were obtained for all subjects, and each clinical site obtained institutional review board approval. ### Vaping and smoking groups Starting from Visit 2, vaping was determined by self-report including ever use, current use, intensity, cartridge size, brand, and flavoring. The four comparison groups for biomarker discovery (vapers, current cigarette users, former cigarette users, and dual users) were defined based on the subjects’ questionnaire responses at Visit 2. Vapers were subjects who reported using at least one e-cigarette within the prior week and had a history of smoking tobacco cigarettes, but not within the last 30 days. Current cigarette users reported current smoking with an average of at least one cigarette per day without any e-cigarette use. Dual users were vapers who also reported current smoking, and former cigarette users were defined as those who reported a history of smoking but did not meet criteria for current smoking or vaping. In most of the reported analyses, former cigarette users are used as the reference group. Differences in baseline characteristics between groups were assessed with Kruskal-Wallis and chi-square tests for continuous and categorical measurements, respectively. ### RNA-sequencing The extraction protocol for total blood RNA was performed either manually or with the Qiagen QIAcube extraction robot according to the company’s standard operating procedure. RNA samples with an RNA integrity number (RIN) > 6 and a concentration of ≥ 25 μg/μl were sequenced. Library preparation was performed with the Illumina TruSeq Stranded Total RNA with Ribo-Zero Globin kit (Illumina, Inc., San Diego, CA). Samples were sequenced to an average depth of 20 million 75 base pairs (bp) paired end reads on Ilumina HiSeq 2500 sequencers. Samples were sequenced to an average depth of 20 million 75 base pairs (bp) paired end reads on Ilumina HiSeq 2500 sequencers. Reads were aligned to the GRCh38 genome using the STAR (version 2.5.2b) align. Gene annotations and transcript GTF were downloaded from the Biomart Ensembl database (Ensembl Genes release 94, GRCh38.p12 assembly). Quality control was performed using the FastQC and RNA-SeQC programs, and samples were included for analysis if they met the following criteria: > 10 million total reads, > 80% of reads mapped to the reference genome, XIST and Y chromosome expression was consistent with reported sex, < 10% of R1 reads in the sense orientation, Pearson correlation ≥ 0.9 with samples in the same library construction batch, and concordant genotype calls between variants called from RNA sequencing reads and DNA genotyping. Sequencing read counts were obtained from the featureCounts function in the Rsubread R package (v1.32.2). Genes were filtered to remove very low expressed genes; we limited our analysis to include only those genes which had an average counts per million (CPM) value > 1 in more than 20 subjects. The trimmed mean of M values (TMM) procedure from the edgeR R package (v3.24.3) was applied to account for the differences in sequencing depth. The gene count data used for this analysis is available in GEO (accession number GSE158699). Counts were transformed to log2 counts per million (CPM) values and quantile-normalized. ### Measurement of plasma protein biomarkers At Visit 2, plasma samples were assayed for 4,979 proteins in 5,670 COPDGene participants using the SOMAscan Human Plasma 5.0K assay, a multiplex aptamer-based assay (SomaLogic, Boulder, Colorado)[14]. The standardization process included within-plate hybridization, median signal normalization, and plate scaling and calibration of SOMAmers to control for inter-assay variation between analytes and batch differences between plates. Inverse normal transformation was applied to all protein measurements prior to association analysis. ### Cotinine measurements Plasma metabolite measurements were obtained from plasma for 1,136 subjects at COPDGene Visit 2 using the Metabolon Global Metabolomics Platform (Durham, NC, USA)[15]. Metabolite levels were quantified as raw area counts. Plasma cotinine levels were evaluated, and missing values were assigned a level of half the lowest detected amount of cotinine. Values were log-transformed, and differences in cotinine distribution between groups were evaluated with the Wilcoxon Rank Sum test. COPDGene metabolomic data are available in the Metabolomics Workbench of the NIH Common Fund’s National Metabolomics Data Repository (Study ID ST001443;). ### RNA-seq differential expression and protein association analyses We used the limma R package (v3.38.3)[16] to test for the associations between RNA transcripts and the vaping/smoking groups, using the former cigarette user group as the reference and adjusting for age, sex, race, and library preparation batch effects. The voom function was used in the RNA-seq analysis. Regression models for protein expression used identical covariates, with the exception that library preparation batch effects were replaced with a variable for clinical center. For both differential gene expression and protein association analyses, we corrected for multiple testing with the Benjamini-Hochberg method and applied a threshold of significance of a false discovery rate (FDR) of 10%[17]. For selected genes and proteins that were associated in only one contrast (vaping or dual only associations), we also conducted association analyses for the contrasts of vapers versus cigarette users, vapers versus dual users, and dual users versus cigarette users. Gene set enrichment analysis were performed using the fast gene set enrichment analysis (FGSEA) R package (v3.12)[18]. Further details are included in the supplementary materials of the online preprint. ### Associations of transcripts and proteins to health outcomes Biomarkers significantly associated with vaping or dual use were tested for association to prospective health-related outcomes, namely change in FEV1 (absolute volume and percent change from baseline), prospective exacerbations, all-cause and respiratory mortality, and incident CVD. FEV1 changes were computed by subtracting Visit 2 from Visit 3 FEV1 values and dividing this difference by the number of years between both visits. FEV1 changes were calculated as both absolute and relative (as a percentage of the Visit 2 measurement) values. Data points >5 standard deviations from the mean were excluded from analysis. The FEV1 outcomes were evaluated using multivariable linear regression models, and the time-to-event outcomes were evaluated using Cox-proportional hazards models using Visit 2 as the starting point. Analyses were adjusted for sex, race, and age at baseline (Visit 2). For time-to-event outcomes baseline FEV1 percentage of predicted was included as a covariate. Since the sample sizes of our vaping/smoking groups were small, these association analyses were performed in all available COPDGene study subjects without stratifying by vaping or smoking exposure. The FDR was controlled at 10% using the Benjamini-Hochberg method. ### Prediction modeling for vaping status and prospective health outcomes Predictive models using only transcript and protein biomarkers were constructed for vaping/smoking groups and prospective health outcomes. For categorical and binary outcomes, linear discriminant analysis (LDA) was used. For change in FEV1 models, elastic net models were constructed. Cross-validation was used to prevent overfitting and estimate predictive performance. Model performance was assessed by cross-validation using the Youden index (sensitivity + specificity - 1), area under the receiver operator characteristic curve (AUROC), and area under the precision recall curve (AUCPR) for binary outcomes. R2 was used to evaluate the performance of change in FEV1 models. Additional details are included in the supplementary materials of the online preprint. ## Results ### Baseline characteristics A visual overview of the study and data flow is included in Supplemental Figures 1 and 2. There were 51 and 77 subjects reporting current vaping only or dual use, respectively, at the COPDGene second study visit. Differences in baseline characteristics of these groups along with the current and former cigarette user groups are reported in Table 1. The large majority of subjects in the vaping group (80%) reported vaping at least once a day. The dual user group reported lighter vaping on average, with only 42% reporting daily vaping, and their pack-years exposure to combustible tobacco was similar to the current cigarette user group. FEV1 % of predicted values were comparable for all groups. Biochemical confirmation of nicotine exposure via analysis of plasma cotinine levels in 853 subjects demonstrated elevated cotinine levels in the vaping, dual use, and current cigarette user groups relative to the former cigarette user group (Figure 1, Wilcoxon rank sum test p*<*0.005 for all comparisons). A subset of the former cigarette user group had elevated levels of nicotine that could be due to use of nicotine replacement therapy, secondhand exposure to tobacco smoke or e-cigarette aerosols, or unreported smoking or vaping. Figure 1. Plasma cotinine levels by vaping/smoking group In a subset of subjects, plasma metabolomics data was measured using the Metabolon Global Metabolomics Platform. Violin plots of log-transformed cotinine raw area counts are shown. Cotinine levels are significantly higher in all three groups relative to FS (Wilcoxon rank sum p < 0.005 for all three comparisons). FS = former cigarette users. CS = current cigarette users. View this table: [Table 1.](http://medrxiv.org/content/early/2022/12/05/2022.09.19.22280093/T1) Table 1. Baseline characteristics of the vaping and smoking groups ### Blood RNA-seq associations with vaping and dual use A total of 18,303 genes were tested for differential expression analysis of vapers, dual users, and current cigarette users, using former cigarette users as the reference group. The top results from each analysis are shown in Table 2, and complete results are reported in Tables S1-S3. For vapers, three significant associations were identified at an FDR of 10% (Figure 2). Two of these genes (*LINC02470* and *NPHP1)* were expressed at a higher level in vapers than any of the other three groups (p<0.05 for all comparisons, Table S4). The third gene, *GPR15*, was one of the most well-reported genes upregulated by smoking. This gene was also upregulated in vapers, but to a lesser extent than in either cigarette users or dual users (p<0.005 for all comparisons). Gene set enrichment analysis identified 16 significant MSigDB Hallmark pathways with the top three processes being related to inflammation, oxidative phosphorylation, and heme metabolism (Table 3). For all significant gene sets, enrichment was in the direction of decreased expression in vapers compared to former cigarette users. Since smoking has been shown to have mixed effects on inflammation, we compared the pattern of gene set enrichment for the Hallmark Inflammation pathway in vapers, current cigarette users, and dual users, and we observed that smoking and dual use show elements of activation and repression of this gene set; the effect of vaping was more clearly downregulation of this pathway (Supplemental Figure 3). Figure 2. Significantly associated vaping RNA biomarkers Three RNA biomarkers were significantly associated at 10% FDR in the analysis comparing vapers to former cigarette users. *NPHP1* and *LINC02470* were significantly associated in the vaping but not the smoking or dual use comparison to former cigarette users. *GPR15* is significantly elevated in all three groups relative to former cigarette users, but its expression is relatively lower in vapers relative to dual users and current cigarette users (p<0.005 for all comparisons). View this table: [Table 2.](http://medrxiv.org/content/early/2022/12/05/2022.09.19.22280093/T2) Table 2. Top 5 differentially associated genes and proteins for vapers, dual users, and current cigarette users. View this table: [Table 3.](http://medrxiv.org/content/early/2022/12/05/2022.09.19.22280093/T3) Table 3. Top 5 significant (adjusted P-values < 0.1) MSigDB Hallmark pathways from differential gene expression analyses for the vaping and smoking groups. For dual users, 90 differentially expressed genes were identified, and 17 of these genes were significant only in the dual user analysis. All 17 genes were also significantly differentially expressed when comparing dual users to current cigarette users, suggesting that these genes have a distinctive response to dual use (p<0.005 for all comparisons, see Table S5). Gene set enrichment analysis identified four significant Hallmark pathways related to heme metabolism, oxidative phosphorylation, E2F signaling, and interferon gamma response (Table 3). Complete results for the vaping, dual use, and smoking analyses including MSigDB Immunologic Signature pathways are presented in Supplemental Tables S6-S8. ### Proteomic biomarker associations to vaping and dual use A total of 4,979 SOMAscan aptamers mapping to 4,720 unique proteins were tested for association to vaping and smoking groups. The top significantly associated proteins are shown in Table 2 (see Table S9 for complete results), and violin plots for selected top protein biomarkers associated with vaping or dual use are shown in Figure 3. Three protein biomarkers (EDIL3, DPYL5, and KLK13) were significantly associated with vaping status at a 10% FDR, and DPYL5 expression was found to be higher in vapers compared to each of the other groups (p<0.05 for all comparisons, Table S10). One hundred protein biomarkers were associated with dual use, with four (CXA8, Keratin 19, CSKP, and PXDC1) being significantly associated only with dual use. All four of these proteins also showed nominal association (p<0.05) in an analysis comparing dual users to current cigarette users, indicating a distinct response to dual use (Table S11). Figure 3. Top protein biomarkers significantly associated with vaping or dual use. The distribution of the top significantly associated protein biomarkers in the analysis of vapers or dual users versus former cigarette users. DPYL5 is significantly elevated in vapers relative to all other vaping/smoking groups (p<0.005). KLK13 is significantly associated with vapers, dual users, and current cigarette users when compared to former cigarette users. s-ICAM5 and Secretoglobin family member 3A member 1 were significantly associated with dual users and current cigarette users but not vapers when compared to former cigarette users. ### Comparison of biomarker associations to vaping, smoking, and dual use To compare the similarity of the transcriptomic and proteomic responses to each exposure, we constructed pairwise plots of the effect sizes of all significantly associated biomarkers (Figure 4). There was a strong positive correlation between the biomarker responses to smoking and dual use (Pearson’s r = 0.76 and 0.88 for RNA and protein, respectively), whereas the vaping biomarker profile had low correlation to the profiles for both dual use and smoking (r between 0.23 and 0.44). Certain biomarkers (LINC02470 and DPYL5 protein) had particularly strong and unique associations with vaping. When we compared effect sizes between RNA and proteomic biomarkers associated with vaping and smoking, the correlation was low (r=0.11) as might be expected given various potential sources of origin for circulating blood proteins. Figure 4. Correlation of biomarker responses between vapers, dual users, and current cigarette users. Comparison of the biomarker association profiles demonstrates that while the dual user and current cigarette user profiles are very similar, the vaping biomarker profile is more distinct. For the RNA and protein biomarker analysis of vaping/smoking groups versus former cigarette users, the effect sizes for each pair of comparisons were plotted and the Pearson correlation of effect sizes was calculated. In each pairwise comparison, only biomarkers significant in at least one of analyses were analyzed. In row A, vaping-associated RNA biomarker effects are compared to current cigarette user effects (left), dual user effects (middle), and then dual user and current cigarette user effects are compared (right). In row B, the corresponding protein biomarker effects are compared. ### Classification accuracy for vaping/smoking behavior using biomarker panels To determine how well RNA and protein biomarkers could classify subjects according to their vaping/smoking behavior, we used a nested cross-validation approach to provide an unbiased estimate of the classification performance achievable by multi-class prediction models using linear discriminant analysis (LDA). The performance of this model is shown in Table 4, where for each group the task was to distinguish that group from all other groups combined. The model achieved good discrimination for current cigarette users (Youden index (YI) = 0.81) and former cigarette users (YI = 0.80) and had statistically significant but less accurate performance for vapers (YI = 0.53) and dual users (YI = 0.55). More detailed metrics for each pairwise contrast (Table S12) demonstrates that the models do better in discriminating vapers from current cigarette users than former cigarette users (YI = 0.73 and 0.50, respectively), whereas for dual users, model discrimination was better for former cigarette users (YI = 0.82) than current cigarette users (YI = 0.34), providing further evidence of the similarity between dual use and current smoking biomarker profiles. View this table: [Table 4.](http://medrxiv.org/content/early/2022/12/05/2022.09.19.22280093/T4) Table 4. Multi-class prediction of vaping and smoking groups using multi-omic biomarkers ### Vaping and dual use biomarkers and longitudinal outcomes To determine whether biomarkers of vaping or dual use were associated with prospective health-related outcomes, we tested the 92 RNA transcripts and 101 protein biomarkers associated with vaping and/or dual use for association to multiple health outcomes in all COPDGene subjects. Twenty-one transcripts and 88 protein biomarkers were associated with one or more of the studied outcomes at a 10% FDR level. Of the vaping-associated biomarkers, two (KLK13 and DPYL5) were associated with loss of FEV1, and KLK13 was also associated with mortality. Boxplots for the top four vaping or dual use proteins associated with all-cause mortality are shown in Figure 5, and all significant biomarker associations are shown in Tables S13-S16. Figure 5. Top protein vaping and dual use biomarkers significantly associated with overall mortality. The dual use-associated proteins Growth hormone receptor, MIC-1, and HE4 were significantly associated with overall mortality. KLK-13 was the only vaping associated biomarker that was also associated mortality. All associations were significant at a false discovery rate of 10%. We then determined whether the effect of vaping or dual use on each protein biomarker was consistent with higher or lower risk for each adverse health outcome, and we observed that the direction of the exposure effect was overwhelmingly towards increased risk for adverse health events (Figure 6). Out of 246 total significant associations to health outcomes, 224 (91%) showed concordance between the effect of vaping or dual use and the direction of association to increased risk. Both of the vaping-associated biomarkers (KLK13 and DPYL5) showed concordant effects for higher health risk. Figure 6. Concordance analysis of biomarker associations to vaping/dual use and adverse health outcomes. For protein biomarkers that were significantly associated with at least one health outcome and vaping (n=2) or dual use (n=86), we plotted the effect of the exposure (vaping or dual use) against the effect with respect to the health outcome. Since some biomarkers were significantly associated with multiple health outcomes, the total number of analyzed associations was 246. The top panel shows the relationship between exposure effect and change FEV1 (ml/yr) where each of the 107 associations was concordant for loss of FEV1. Change in FEV1 is calculated as visit 3 value - visit 2 value, meaning negative values indicate loss of lung function. The bottom panel shows the relationship between exposure effect and hazard ratio for all cause mortality, respiratory mortality, exacerbations, or incident cardiovascular disease. 117 of 139 associations were concordant for higher health risk. Exposure effect is the beta coefficient from the dual use versus former cigarette users analysis, except for DPYL5 and KLK13 which is from the vapers versus former cigarette users analysis. To estimate the accuracy of multi-marker panels of vaping and dual use biomarkers for predicting prospective health outcomes, we constructed predictive models for FEV1 decline, all-cause mortality, COPD exacerbations, and incident CVD using only the vaping or dual use-associated RNA and protein biomarkers. Models using the seven vaping-associated biomarkers alone gave non-informative predictions, but models using the 190 dual use-associated biomarkers gave informative predictions with moderate accuracy for FEV1 decline, all-cause mortality and respiratory exacerbations (Table 5). View this table: [Table 5.](http://medrxiv.org/content/early/2022/12/05/2022.09.19.22280093/T5) Table 5. Performance metrics of prediction models for prospective health outcomes. ### Replication of previously reported vaping biomarker associations A recently published review of vaping biomarkers listed 14 inflammatory and matrix degradation protein biomarkers that had been previously associated with vaping[19]. We examined the results from these 14 biomarkers at the RNA and protein level for association to vaping or dual use, and we observed associations at nominal significance (p<0.05) for four biomarkers (*IL1B*, ICAM1 [both RNA and protein], IL-6, and MMP9, see Table S17). ## Discussion Identifying validated biomarkers of vaping and dual use is an important tool both for guiding ENDS users as to the risks of these devices and effective regulation of ENDS. This study of established adult cigarette users identified significant associations of six and 190 blood molecular markers with vaping and dual use, respectively. Because of the comprehensive nature of the ‘omics measurements, we were able to compare overall molecular profiles of vaping, dual use, and smoking. These analyses found that the vaping profile was substantially different from current smoking, whereas dual use and smoking were similar but not identical. Biomarkers associated with vaping and dual use were also associated with multiple cardiopulmonary health outcomes, and reasonably predictive accuracy for some of these outcomes could be achieved for models using dual use biomarkers. A previous metabolomic biomarker study by Goniewicz *et al*. demonstrated that vaping is associated with metabolomic changes, albeit at generally lower levels compared to current cigarette users or dual users[20]. A recent study of vapers versus non-cigarette users identified numerous changes in plasma biomarkers associated with inflammation, oxidative stress, and extracellular matrix degradation[19]. A recent meta-analysis of biomarkers of ENDS versus cigarettes reported a more favorable biomarker profile of ENDS use, though the last author of the study has been reported to have ties to the tobacco industry[21,22]. Another systematic review also reported reduced levels of some biomarkers in ENDS users relative to current cigarette users, but this is not consistent for all biomarkers (Hiler *et al*., 2021). By examining the correlation in biomarker profiles for vaping, dual use, and smoking, we observed that the vaping profile is largely distinct from both the dual use and smoking profile which are in turn highly correlated to each other. We did however observe multiple examples of shared RNA and protein biomarkers between smoking, vaping, and dual use, such as *GPR15*, a regulator of inflammation and T-cell trafficking. We also provided independent validation for previously reported vaping or dual use associations with *IL1B*, ICAM1, IL-6, and MMP9, all of which are known to influence inflammation (IL-1B can induce MMP-9 expression and activity). Both vaping and dual use are associated with diverse effects on biological pathways related to aspects of inflammation, oxidative phosphorylation, and coagulation-related pathways. Previous studies have also shown that long non-coding RNAs (lncRNAs) are associated with vaping[23,24], and in our work we identified LINC02470 as a novel vaping biomarker. LINC02470 has previously been reported to modulate Wnt-beta catenin signaling via exosomal secretion from bladder cancer cells[25]. Because Tang *et al*. previously demonstrated bladder urothelial hyperplasia in mice exposed daily to e-cigarette aerosols, the finding of LINC02470 in the circulation of human ENDS users is highly concerning. We also identified DPYL5 as being uniquely upregulated in vaping subjects. DPYL5 is primarily expressed in brain tissue and glial cells, though it is also expressed at detectable levels in the gastrointestinal (GI) tract. DPYL5 is potentially an important vaping biomarker, because it was also predictive of future loss of lung function. To our knowledge, this is the first report of DPYL5 as a specific and clinically relevant biomarker of vaping, a finding that requires further independent validation. Many of the biomarkers altered by vaping and dual use are also associated with risk of adverse health events such as loss of lung function, respiratory exacerbations, and mortality. For the overwhelming majority of biomarkers, the direction of effect of vaping and dual use was consistent with increased health risk. This extends numerous prior observations linking vaping to a variety of pulmonary symptoms and alterations of cardiovascular physiology[26], providing an important link to longer-term health outcomes and suggesting that some of these biomarkers may be useful in assessing health risks of specific vaping devices and e-liquid formulations. For dual use, a multi-marker panel of associated biomarkers achieved modest predictive accuracy for some important outcomes. For vapers, the biomarker profile was more subtle, and with our current sample size we did not identify a sufficient number of biomarkers to enable accurate multi-marker prediction of health outcomes. The strengths of this study are the high-throughput approach to biomarker discovery, the characterization of associations to prospective health events, and the use of machine learning to assess the predictive performance of multimarker panels. Limitations include a modest sample size of vapers and dual users and the lack of independent replication due to lack of available cohorts with similar biomarker data and exposure characterization. Information regarding specific vaping devices and fluids was limited and reflects the challenges posed by the rapid evolution of vaping devices over the study period. Our findings cannot be assumed to generalize to newer-generation vaping devices which will require dedicated study, and our findings in adults do not necessarily apply to other important groups such as adolescents. In conclusion, this study identified over one-hundred transcriptomic and proteomic biomarkers associated with vaping and dual use, and many of these biomarkers are also associated with prospective cardiopulmonary health outcomes. Prediction of health outcomes using multimarker panels can provide informative prediction with room for improvement, and validation in independent cohorts will increase confidence in individual biomarkers and multi-marker models. Larger-scale studies with greater power would be likely to identify additional biomarkers that could further improve predictive model performance. ## Supporting information Supplemental Materials [[supplements/280093_file02.docx]](pending:yes) Supplemental Tables 1-3 [[supplements/280093_file03.xlsx]](pending:yes) Supplemental Tables 4-5 [[supplements/280093_file04.xlsx]](pending:yes) Supplemental Tables 6-8 [[supplements/280093_file05.xlsx]](pending:yes) Supplemental Table 9 [[supplements/280093_file06.xlsx]](pending:yes) Supplemental Tables 10-11 [[supplements/280093_file07.xlsx]](pending:yes) Supplemental Table 12 [[supplements/280093_file08.docx]](pending:yes) Supplemental Tables 13-16 [[supplements/280093_file09.xlsx]](pending:yes) Supplemental Table 17 [[supplements/280093_file10.xlsx]](pending:yes) ## Data Availability All data produced are available online at dbGaP [https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study\_id=phs000179.v6.p2](https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000179.v6.p2) ## Data availability Supplemental materials are included in the medRxiv online preprint (doi: [https://doi.org/10.1101/2022.09.19.22280093](https://doi.org/10.1101/2022.09.19.22280093)). ### Underlying data NCBI dbGAP: [https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study\_id=phs000179.v6.p2](https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000179.v6.p2) ## Competing interests EKS and PJC have received grant support from GlaxoSmithKline and Bayer. JY has received institutional grant support from Bayer and personal fees from Bride Biotherapeutics outside the submitted work. ## Grant information This work was supported by NHLBI K08HL141601, K08HL146972, K01HL157613, R01HL124233, U01 HL089897, R01 HL147326, and U01 HL089856. The COPDGene study ([NCT00608764](http://medrxiv.org/lookup/external-ref?link_type=CLINTRIALGOV&access_num=NCT00608764&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F05%2F2022.09.19.22280093.atom)) is also supported by the COPD Foundation through contributions made to an Industry Advisory Committee that has included AstraZeneca, Bayer Pharmaceuticals, Boehringer-Ingelheim, Genentech, GlaxoSmithKline, Novartis, Pfizer, and Sunovion. ## Footnotes * **Authors’ email addresses:** Andrew Gregory (andrew.gregory{at}channing.harvard.edu), Zhonghui Xu (zhonghui.xu{at}channing.harvard.edu), Katherine Pratte (PratteK{at}njhealth.org), Seth Berman (seth.berman{at}channing.harvard.edu), Robin Lu (robin.lu{at}channing.harvard.edu), Rahul Suryadevara (rahul.suryadevara{at}channing.harvard.edu), Robert Chase (robert.chase{at}channing.harvard.edu), Jeong H. Yun (jeong.yun{at}channing.harvard.edu), Aabida Saferali (aabida.saferali{at}channing.harvard.edu), Craig P. Hersh (craig.hersh{at}channing.harvard.edu), Edwin K. Silverman (ed.silverman{at}channing.harvard.edu), Russell P. Bowler (BowlerR{at}njhealth.org), Laura E. Crotty Alexander (lca{at}ucsd.edu), Adel Boueiz (adel.boueiz{at}channing.harvard.edu), Peter J. Castaldi (peter.castaldi{at}channing.harvard.edu) * All authors gave final approval of the version to be published. * Supplemental materials included * Received September 19, 2022. * Revision received December 3, 2022. * Accepted December 5, 2022. * © 2022, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), CC BY-NC 4.0, as described at [http://creativecommons.org/licenses/by-nc/4.0/](http://creativecommons.org/licenses/by-nc/4.0/) ## References 1. Gentzke AS, Wang TW, Jamal A, et al. Tobacco Product Use Among Middle and High School Students - United States, 2020. MMWR Morb Mortal Wkly Rep 2020;69:1881–8. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.15585/mmwr.mm6950a1&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F05%2F2022.09.19.22280093.atom) 2. Mayer M, Reyes-Guzman C, Grana R, et al. Demographic Characteristics, Cigarette Smoking, and e-Cigarette Use Among US Adults. JAMA Netw Open 2020;3:e2020694. 3. Li D, Sundar IK, McIntosh S, et al. Association of smoking and electronic cigarette use with wheezing and related respiratory symptoms in adults: cross-sectional results from the Population Assessment of Tobacco and Health (PATH) study, wave 2. Tobacco Control. 2019;:tobaccocontrol – 2018. doi:10.1136/tobaccocontrol-2018-054694 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MTQ6InRvYmFjY29jb250cm9sIjtzOjU6InJlc2lkIjtzOjg6IjI5LzIvMTQwIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMTIvMDUvMjAyMi4wOS4xOS4yMjI4MDA5My5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 4. Yao T, Max W, Sung H-Y, et al. Relationship between spending on electronic cigarettes, 30-day use, and disease symptoms among current adult cigarette smokers in the U.S. PLoS One 2017;12:e0187399. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F05%2F2022.09.19.22280093.atom) 5. St Helen G, Eaton DL. Public Health Consequences of e-Cigarette Use. JAMA Intern Med 2018;178:984–6. 6. Smith ML, Gotway MB, Crotty Alexander LE, et al. Vaping-related lung injury. Virchows Arch 2021;478:81–8. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00428-020-02943-0&link_type=DOI) 7. Cherian SV, Kumar A, Estrada-Y-Martin Rm. E-Cigarette or Vaping Product-Associated Lung Injury: A Review. Am J Med 2020;133:657–63. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F05%2F2022.09.19.22280093.atom) 8. Hammond D, Reid JL, Rynard VL, et al. Prevalence of vaping and smoking among adolescents in Canada, England, and the United States: repeat national cross sectional surveys. BMJ 2019;365:l2219. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiYm1qIjtzOjU6InJlc2lkIjtzOjE3OiIzNjUvanVuMTlfNy9sMjIxOSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzEyLzA1LzIwMjIuMDkuMTkuMjIyODAwOTMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 9. Miech R, Johnston L, O’Malley PM, et al. Trends in Adolescent Vaping, 2017–2019. N Engl J Med 2019;381:1490–1. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F05%2F2022.09.19.22280093.atom) 10. Miech R, Johnston L, O’Malley PM, et al. Adolescent vaping and nicotine use in 2017-2018 - U.s. national estimates. N Engl J Med 2019;380:192–3. [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F05%2F2022.09.19.22280093.atom) 11. Aronson JK. Research priorities in biomarkers and surrogate end-points. Br J Clin Pharmacol 2012;73:900–7. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.1365-2125.2012.04234.x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22360291&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F05%2F2022.09.19.22280093.atom) 12. FDA-NIH Biomarker Working Group. BEST (Biomarkers, EndpointS, and Other Tools) Resource. Food and Drug Administration (US) 2016. 13. Regan EA, Hokanson JE, Murphy JR, et al. Genetic epidemiology of COPD (COPDGene) study design. COPD 2010;7:32–43. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3109/15412550903499522&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20214461&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F05%2F2022.09.19.22280093.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000278277100006&link_type=ISI) 14. Gold L, Ayers D, Bertino J, et al. Aptamer-based multiplexed proteomic technology for biomarker discovery. PLoS One 2010;5:e15004. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0015004&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=21165148&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F05%2F2022.09.19.22280093.atom) 15. Gillenwater LA, Kechris KJ, Pratte KA, et al. Metabolomic Profiling Reveals Sex Specific Associations with Chronic Obstructive Pulmonary Disease and Emphysema. Metabolites 2021;11. doi:10.3390/metabo11030161 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3390/metabo11030161&link_type=DOI) 16. Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 2015;43:e47. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nar/gkv007&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25605792&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F05%2F2022.09.19.22280093.atom) 17. Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc 1995;57:289–300. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1111/j.2517-6161.1995.tb02031.x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24443148&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F05%2F2022.09.19.22280093.atom) 18. Gennady Korotkevich, Vladimir Sukhov, Nikolay Budin, Boris Shpak, Maxim N. Artyomov, Alexey Sergushichev. An algorithm for fast preranked gene set enrichment analysis using cumulative statistic calculation. biorxiv. 2021. doi:10.1101/060012 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czo4OiIwNjAwMTJ2MyI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIyLzEyLzA1LzIwMjIuMDkuMTkuMjIyODAwOTMuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 19. Singh KP, Lawyer G, Muthumalage T, et al. Systemic biomarkers in electronic cigarette users: implications for noninvasive assessment of vaping-associated pulmonary injuries. ERJ Open Res 2019;5. doi:10.1183/23120541.00182-2019 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NToiZXJqb3IiO3M6NToicmVzaWQiO3M6MTQ6IjUvNC8wMDE4Mi0yMDE5IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjIvMTIvMDUvMjAyMi4wOS4xOS4yMjI4MDA5My5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 20. Goniewicz ML, Smith DM, Edwards KC, et al. Comparison of Nicotine and Toxicant Exposure in Users of Electronic Cigarettes and Combustible Cigarettes. JAMA Netw Open 2018;1:e185937. 21. Akiyama Y, Sherwood N. Systematic review of biomarker findings from clinical studies of electronic cigarettes and heated tobacco products. Toxicology Reports. 2021;8:282–94. doi:10.1016/j.toxrep.2021.01.014 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.toxrep.2021.01.014&link_type=DOI) 22. Neil Sherwood. TobaccoTactics. 2020.[https://tobaccotactics.org/wiki/neil-sherwood/](https://tobaccotactics.org/wiki/neil-sherwood/) (accessed 26 Jun 2022). 23. Tommasi S, Caliri AW, Caceres A, et al. Deregulation of Biologically Significant Genes and Associated Molecular Pathways in the Oral Epithelium of Electronic Cigarette Users. Int J Mol Sci 2019;20. doi:10.3390/ijms20030738 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3390/ijms20030738&link_type=DOI) 24. Kaur G, Singh K, Maremanda KP, et al. Differential plasma exosomal long non-coding RNAs expression profiles and their emerging role in E-cigarette users, cigarette, waterpipe, and dual smokers. PLoS One 2020;15:e0243065. 25. Huang C-S, Ho J-Y, Chiang J-H, et al. Exosome-Derived LINC00960 and LINC02470 Promote the Epithelial-Mesenchymal Transition and Aggressiveness of Bladder Cancer Cells. Cells 2020;9. doi:10.3390/cells9061419 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3390/cells9061419&link_type=DOI) 26. Bozier J, Chivers EK, Chapman DG, et al. The Evolving Landscape of e-Cigarettes: A Systematic Review of Recent Evidence. Chest 2020;157:1362–90. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.chest.2019.12.042&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2022%2F12%2F05%2F2022.09.19.22280093.atom)