Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Subtyping strokes using blood-based biomarkers: A proteomics approach

View ORCID ProfileShubham Misra, Praveen Singh, Shantanu Sengupta, Manoj Kushwaha, Zuhaibur Rahman, Divya Bhalla, Pumanshi Talwar, Manabesh Nath, Rahul Chakraborty, Pradeep Kumar, Amit Kumar, Praveen Aggarwal, Achal K Srivastava, Awadh K Pandit, Dheeraj Mohania, Kameshwar Prasad, Deepti Vibha
doi: https://doi.org/10.1101/2023.06.10.23291233
Shubham Misra
1Department of Neurology, All India Institute of Medical Sciences, New Delhi, India
2Department of Neurology, Yale University School of Medicine, New Haven, CT
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Shubham Misra
Praveen Singh
3CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shantanu Sengupta
3CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Manoj Kushwaha
3CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zuhaibur Rahman
3CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Divya Bhalla
3CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Pumanshi Talwar
1Department of Neurology, All India Institute of Medical Sciences, New Delhi, India
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Manabesh Nath
1Department of Neurology, All India Institute of Medical Sciences, New Delhi, India
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rahul Chakraborty
3CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Pradeep Kumar
1Department of Neurology, All India Institute of Medical Sciences, New Delhi, India
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Amit Kumar
1Department of Neurology, All India Institute of Medical Sciences, New Delhi, India
4Department of Laboratory Medicine, Rajendra Institute of Medical Sciences, Ranchi, India
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Praveen Aggarwal
5Department of Emergency Medicine, All India Institute of Medical Sciences, New Delhi, India
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Achal K Srivastava
1Department of Neurology, All India Institute of Medical Sciences, New Delhi, India
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Awadh K Pandit
1Department of Neurology, All India Institute of Medical Sciences, New Delhi, India
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Dheeraj Mohania
6Dr. R.P. Centre, All India Institute of Medical Sciences, New Delhi, India
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kameshwar Prasad
1Department of Neurology, All India Institute of Medical Sciences, New Delhi, India
7Department of Neurology, Rajendra Institute of Medical Sciences, Ranchi, India
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Deepti Vibha
1Department of Neurology, All India Institute of Medical Sciences, New Delhi, India
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: deeptivibha{at}gmail.com
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Background and Objectives: Rapid diagnosis of stroke and its subtypes is critical in early stages. We aimed to discover and validate blood-based protein biomarkers to differentiate ischemic stroke (IS) from intracerebral hemorrhage (ICH) within 24 hours using high-throughput proteomics.

Methods: We collected serum samples within 24 hours from acute stroke (IS & ICH) and mimics patients. In the discovery phase, SWATH-MS proteomics identified differentially expressed proteins (fold change: 1.5, p<0.05, and confirmed/tentative selection using Boruta random forest) between IS and ICH which were validated using Multiple Reaction Monitoring (MRM) proteomics in the validation phase. Protein-protein interactions and pathway analysis were conducted using STRING version 11 and Cytoscape 3.9.0. Cut-off points were determined using Youden Index. Prediction models were developed using backward stepwise multivariable logistic regression analysis. Hanley-McNeil test, Integrated discrimination improvement index, and likelihood ratio test determined the improved discrimination ability of biomarkers added to clinical models.

Results: Discovery phase included 20 IS and 20 ICH while validation phase included 150 IS, 150 ICH, and 6 stroke mimics. We quantified 365 proteins in the discovery phase. Between IS and ICH, we identified 20 differentially expressed proteins. In the validation phase, combined prediction model including three biomarkers: GFAP (OR 0.04; 95%CI 0.02-0.11), MMP9 (OR 0.09; 95%CI 0.03-0.28), APO-C1 (OR 5.76; 95%CI 2.66-12.47) and clinical variables independently differentiated IS from ICH (accuracy: 92%, sensitivity: 96%, specificity: 69%). Addition of biomarkers to clinical variables improved the discrimination capacity by 26% (p<0.001). Subgroup analysis within 6 hours identified that GFAP and MMP9 differentiated IS from ICH with a sensitivity> 90%.

Conclusions: Our study identified that GFAP, MMP, and APO-C1 biomarkers independently differentiated IS from ICH within 24 hours and significantly improved the discrimination ability to predict IS. Temporal profiling of these biomarkers in the acute phase of stroke is urgently warranted.

Introduction

An early diagnosis of stroke and definitive acute treatment is critical in acute stroke care.1 Early detection of stroke and its subtypes is crucial since the efficacy of reperfusion therapies in ischemic stroke (IS) including intravenous thrombolysis2–4 and mechanical thrombectomy,5,6 is time-dependent. There are several challenges in the diagnosis of IS including poor sensitivity of a computed tomography (CT) scan to diagnose IS in the acute phase, lack of availability of a magnetic resonance imaging (MRI) scan and trained neurologists in resource-limited settings, and expensive cost of imaging equipment’s. Therefore, the search for a novel molecular marker for stroke diagnosis has received growing attention in the last 20 years.7–9

As of 2022, we do not have a functional blood-based stroke biomarker (either individual or in combination) which can differentiate IS from intracerebral hemorrhage (ICH) and other similar conditions. We also do not have a point of care test which can strengthen the stroke diagnosis along with neuroimaging. Most of the studies conducted till date have used the candidate-based immunoassays with every protein requiring a separate assay or a multiplex assay approach to determine diagnostic biomarkers for differentiating IS from healthy controls, stroke mimics, and ICH within 24 hours. A high-throughput proteomics approach, on the other hand, can simultaneously quantify many proteins in an exploratory fashion and could potentially be a highly sensitive method to detect diagnostic protein markers in stroke patients at a low cost per sample in a rapid and reproducible manner. Further, recent advancements in label-free quantification of proteins have offered a cost-effective alternative to the labelling methods for discovering protein biomarkers in stroke with subsequent validation in larger cohorts using the targeted proteomics methods. Label-free techniques like sequential windowed acquisition of all theoretical fragment ion mass spectra (SWATH-MS) have been preferred over labelled proteomics techniques for large clinical studies focussed on biomarker discovery.10

With the growing need for a blood-based diagnostic biomarker in stroke, it is imperative to utilize the advantages of the latest high-throughput proteomics approaches and focus on relevant clinical questions for identifying protein biomarkers that have adequate sensitivity and/or specificity to differentiate stroke subtypes in clinical settings. Therefore, we aimed to discover blood-based protein biomarkers to diagnose and differentiate IS from ICH within 24 hours using the untargeted SWATH-MS proteomics approach. We also validated the diagnostic performance of the discovered proteins in a separate cohort of IS, ICH, and stroke mimics using the targeted Multiple Reaction Monitoring (MRM) proteomics approach.

Methods

Study population and study design

This diagnostic test study was conducted in the Department of Neurology, All India Institute of Medical Sciences (AIIMS), New Delhi, India in collaboration with Institute of Genomics and Integrative Biology (IGIB), New Delhi, India. From October 2017 to March 2020, we included consecutive stroke patients aged 18 years and above, ischemic or hemorrhagic confirmed by neuroimaging and clinical diagnosis admitted within 24 hours of symptom onset to the neurology wards and/or emergency department of AIIMS, New Delhi. All included patients had clinical signs consistent with the definition of stroke given by the American Heart Association (AHA)/American Stroke Association (ASA).11 We also included a group of stroke mimics presenting within 24 hours with stroke-like symptoms. We excluded patients with a previous history of stroke, admission after the 24-hour onset of the qualifying event, cerebral vein thrombosis, chronic liver and/or kidney disease, documented history of cancer, pregnancy, and unwilling to provide written informed consent.

The study was approved by the Local Institutional Ethics Committee of AIIMS, New Delhi (Ref. No. IECPG-395/28.09.2017). We obtained written informed consent from all the recruited patients or legally authorized representative prior to collecting blood samples and clinical history. The study was divided into two phases: (1) discovery phase and (2) validation phase. The study flow diagram is represented in Figure 1.

Figure 1:
  • Download figure
  • Open in new tab
Figure 1:

Study flow diagram

Sample size

For the discovery phase, based on the feasibility, budget, and time-frame of the study, we selected 40 stroke patients comprising of 20 IS and 20 ICH. The literature suggests a sample size of 10 to 30 for conducting a pilot/discovery phase study.12,13

For the validation phase, to achieve a sensitivity and specificity of 90% with 5% margin of error and 5% drop out rate, we estimated a final sample size of 300 comprising 150 IS and 150 ICH patients14 using STATA version 13.0 software.

Blood sample collection

We collected five ml of peripheral blood samples in serum vacutainer tubes from stroke patients and mimics within 24-hour onset. For serum collection, it was left standing at room temperature for 30 minutes until clotted. It was then centrifuged at 3000g for 10 minutes, after which the serum was separated into cryovials. Five aliquots of each sample (100µl) were prepared and stored at -80°C until further analysis.

Sample preparation, Reduction, alkylation, and trypsin digestion

Refer to previously published paper.15

DISCOVERY PHASE

SWATH-MS data acquisition

Refer to previously published paper.15

Bioinformatic analysis

We used a high-pH fractionated peptide library for human serum proteins (obtained from SCIEX) comprising 465 proteins. SWATH peaks were extracted using this library in SWATH 2.0 microapp in PeakView 2.2 software (SCIEX), excluding shared peptides. The processing settings for peak extraction were: maximum 10 peptides per protein, 5 transitions per peptide, >95% peptide confidence threshold, 1% peptide false discovery rate (FDR). XIC extraction window was set to 55 minutes with 75 ppm XIC Width. Data were normalized using total area sum normalization and log2 transformed. Batch correction was performed for removing the non-biological experimental variations and was visualized using the Principal Component Analysis (PCA) plots. Differentially expressed proteins were selected using two criteria: (i) p-value <0.05 and ± 1.5 fold change cut-offs (>1.5 for upregulated and <0.67 for downregulated proteins); or (ii) confirmed/tentative selection in the Boruta random forest feature selection method. The STRING 11 online tool16 was used to create the protein network of the differentially expressed proteins between various conditions. Further, protein-protein interaction network and the functional enrichment analyses were conducted using Cytoscape 3.9.0 software.17

VALIDATION PHASE

Differentially expressed proteins obtained in the discovery phase and five proteins identified in our meta-analysis9 were validated using targeted proteomics in the validation phase.

Peptide selection for Multiple Reaction Monitoring (MRM)-based targeted proteomics

Peptide selection was performed using search results from ProteinPilot,18 PeptideAtlas19 or in-silico generated peptides of proteins using Expasy PeptideCutter tool.20 Peptides with +2 and +3 charges were considered for MRM and for each peptide, 5-6 fragment ions were used for identification. Peptide selection for the list of shortlisted proteins is given in supplementary Table 1.

MRM data acquisition

Tryptic peptides obtained after digestion were desalted using reversed-phase Oasis HLB cartridges (Waters, Milford, MA) as per manufacturer’s protocol. The peptide mixture was dried using a vacuum centrifuge, and the peptides were resuspended in 0.1% formic acid at a final concentration of 1μg/μl. A heavy labeled peptide for Apo A1 (QGLLPVLESFK; K=Lysine-13C6,15N2) protein was spiked-in the resolubilized plasma digest at a final concentration of 1ng/μl.

The targeted MRM-MS21 analysis of the tryptic peptides was performed on a TSQ Altis (Thermo Fisher, San Jose, CA). The instrument was equipped with an H-ESI ion source. A spray voltage of 3.5 keV was used with a heated ion transfer tube set at a temperature of 325°C. Chromatographic separations of peptides were performed on Vanquish UHPLC system (Thermo Fisher, San Jose, CA).

Ten μl sample was injected and peptides were loaded on an ACQUITY UPLC BEH C18 column (130Å, 1.7 µm, 2.1 mm X 100 mm, Waters) from a cooled (4°C) autosampler and separated with a linear gradient of water (buffer A) and acetonitrile (buffer B), containing 0.1% formic acid, at a flow rate of 300µl/minute in 30 minutes gradient run.

Statistical analysis

Skyline version 21.1 was used to analyze the MRM data.22 The area for each peptide was spike-in normalized using a heavy labeled peptide for Apo A1 protein and then log2 transformed. The random forest-based imputation was performed for peptides with <10% missing values.23 The normality of the data was assessed using the Shapiro-Wilk test.

The results of our study were reported as per the Standard for Reporting Diagnostic Accuracy (STARD) guidelines.24 For each comparison, IS was taken as the endpoint and univariable logistic regression analysis was conducted using odds ratio (OR) and 95% confidence interval (CI). Receiver operating characteristic (ROC) curve analyses were performed for biomarkers differentiating IS from ICH and stroke mimics. The optimal cut-off points were determined using the Youden Index. The sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of each biomarker were determined. Since the aim of this study was to detect and diagnose IS, cut-offs for each biomarker were selected having the highest sensitivity values to detect IS and to rule out ICH or mimics with the best available specificity values.

Prediction models were developed to determine the accuracy and precision of protein biomarkers and their discrimination ability was evaluated using area under the curve (AUC)/c-statistic. The clinical variables with p-value <0.1 in the univariable analysis along with demographic variables like age and sex, were included in a backward stepwise multivariable logistic regression analysis. Afterward, biomarkers were added incrementally to this model. Those biomarkers that contributed to at least a 1% increase in the AUC/c-statistic were retained in the model. Another prediction model was developed to determine the diagnostic potential of biomarkers alone in differentiating IS. For each prediction model, the degree of multicollinearity was tested using the variance inflation factor (VIF) and predictors with VIF value >2.5 were removed from the final model. The discrimination ability (AUC) between various prediction models was compared and quantitatively assessed using the Hanley-McNeil test p-value.25 At different pre-test probabilities of the disease, the corresponding post-test probability, and the multi-level likelihood ratio were determined. The integrated discrimination improvement (IDI) index was calculated to assess the added value of the biomarkers to the clinical predictive models.26,27 The goodness-of-fit between the two hierarchical prediction models was evaluated using the Likelihood Ratio (LR) test based on the ratio of their likelihoods.28 The selection of the best prediction model was done using Akaike’s Information Criteria (AIC)29 and Bayesian Information Criteria (BIC).30 The statistical analyses were conducted in R version 3.6.2 and STATA version 13.1.

Data availability

The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE31 partner repository with the dataset identifier PXD032917.

Results

Results from the Discovery Phase

We enrolled 40 patients in the discovery phase: 20 IS and 20 ICH patients recruited within 24 hours. Between IS and ICH cases, there was no significant difference in the mean age (p=0.12) and mean blood sampling time from symptom onset (p=0.86). The baseline characteristics of the patients are given in supplementary Table 2.

Serum proteomic profiles were compared between 20 IS and 20 ICH cases using the SWATH-MS approach. We quantified 375 proteins at 1% peptide FDR between IS and ICH. The total ion chromatogram and the PCA plots depicting batch variations are given in supplementary figure 1 and supplementary figure 2.

We identified 14 differentially expressed proteins using the fold change and p-value criteria. Three proteins were upregulated, while 11 were downregulated in IS compared to ICH (Figure 2a). We further identified 11 confirmed/tentative features between IS and ICH using the Boruta random forest method (Figure 2b and supplementary Table 3). Five proteins (UniProt IDs: P01023, P01833, P01764, P19652, and P02654) were common using both protein selection criteria. Therefore, after combining the distinctly expressed proteins using both criteria, 20 differentially expressed proteins were identified between IS and ICH within 24 hours (Table 1). A heatmap of these 20 proteins is depicted in Figure 2c.

Figure 2a:
  • Download figure
  • Open in new tab
Figure 2a:

Volcano plot depicting the log2 fold change on the x-axis and -log10 p-value on the y-axis for the upregulated and downregulated proteins in 20 IS cases compared to 20 ICH cases within 24 hours. Figure 2b: Feature selection using the Boruta random forest between 20 IS and 20 ICH cases within 24 hours. Figure 2c: Heatmap showing the distribution and expression pattern of 20 significantly differentially expressed proteins between IS and ICH. Figure 2d: Protein-protein interaction network analysis of differentially expressed proteins between IS and ICH using the Cytoscape software. The colour of nodes and edges represents the degree of interaction and interaction score.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 1:

List of significantly differentially expressed proteins between IS and ICH within 24 hours of symptom onset using Fold change with p-value and Boruta random forest feature selection criteria

Interaction network analysis observed that 15 proteins formed a highly connected network showing molecular interactions with other proteins (Figure 2d). APOH had the highest degree of interaction with other proteins followed by APOA1, A2M, and HP. In our network, 10 protein-protein interactions had an interaction score of >0.90. Functional enrichment analysis identified molecular pathways/processes involving blood microparticles, extracellular space, lipoprotein particles, cholesterol metabolism, platelet degranulation, and post translational protein phosphorylation that were significantly different between IS and ICH (supplementary Table 4).

Results from the Validation Phase

We included 306 cases in the validation phase; 300 stroke patients (150 IS and 150 ICH) and 6 stroke mimics within 24 hours. The mean age of IS, ICH, and stroke mimics were 54.59±15.54, 55.30±12.72, and 47.83±18.25 years, respectively. IS, ICH, and stroke mimics consisted of 64.67%, 65.33%, and 66.67% males. The mean blood sampling time (in hours) from symptom onset was 15.29±5.55 in IS, 14.97±6.55 in ICH, and 8.75±5.07 in stroke mimics. The 6 stroke mimics included patients suffering from hypoglycaemia (2), vertigo (2), syncope (1), and seizure (1) (supplementary Table 5 and supplementary Table 6).

We validated 20 differentially expressed proteins identified in the discovery phase and five proteins (GFAP, BNP, D-dimer, MMP-9, and UCH-L1) identified in our published meta-analysis.9 While selecting the peptides of interest for MRM, unique peptides for two proteins (UniProt IDs: P01764 and P01601) could not be determined, while the unique peptide (TDISMSDFENSR) for PIGR protein (UniProt ID: P01833) was not detectable in any sample, thus they were excluded from the final analysis. Therefore, 22 proteins (17 from discovery phase and five from meta-analysis) were selected in our validation phase. One unique peptide per protein was used for quantification. The final list of unique peptides and quantifier fragment ion for each protein is given in supplementary Table 7.

Univariable analysis in the Validation Phase

In the univariable analysis, the protein concentration of 15 out of 22 biomarkers were different between IS and ICH (p-value <0.1) (supplementary Table 8). The diagnostic characteristics for the 15 categorized biomarkers are given in Table 2. In a subgroup analysis conducted within 6 hours, nine of 22 biomarkers significantly differentiated 12 IS from 26 ICH cases (supplementary Table 9 and supplementary Table 10). In a secondary analysis, only three biomarkers differentiated IS from stroke mimics within 24 hours (supplementary Table 11).

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 2:

The diagnostic potential of protein biomarkers assessed in the validation phase of the study to differentiate ischemic stroke from intracerebral hemorrhage within 24 hours of onset

Multivariable regression analysis in the validation phase

Clinical model

Five clinical variables including sex (OR 2.12; 95%CI 1.12-4.02), hypertension (OR 0.40; 95%CI 0.23-0.70), atrial fibrillation (OR 9.13; 95%CI 2.01-41.57), current smoking (OR 3.19; 95%CI 1.70-5.96), and NIHSS score at admission (OR 0.87; 95%CI 0.84-0.90) were independent predictors of IS (AUC/c-statistic of 81%) (Table 3).

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 3:

Multivariable prediction models to differentiate IS from ICH within 24 hours

Biomarker model

Eight protein biomarkers including GFAP (OR 0.06; 95%CI 0.03-0.16), MMP-9 (OR 0.16; 95%CI 0.06-0.42), APO-C1 (OR 6.56; 95%CI 3.20-13.45), APO-A1 (OR 0.28; 95%CI 0.09-0.81), Haptoglobin (OR 2.88; 95%CI 1.26-6.58), UCH-L1 (OR 0.35; 95%CI 0.14-0.84), ORM2 (OR 0.38; 95%CI 0.15-0.95), and Serotransferrin (OR 4.40; 95%CI 1.30-14.85) were independent predictors of IS (AUC/c-statistic of 88%) (Table 3).

Combined model (clinical variables + protein biomarkers)

Proteins that were significant in the biomarker model were incrementally added to the clinical model. In the combined prediction model, four clinical variables including hypertension (OR 0.38; 95%CI 0.19-0.76), atrial fibrillation (OR 21.92; 95%CI 3.17-151.35), current smoking (OR 3.00; 95%CI 1.37-6.55), and NIHSS score at admission (OR 0.87; 95%CI 0.84-0.91), and three protein biomarkers including GFAP (OR 0.04; 95%CI 0.02-0.11), MMP-9 (OR 0.09; 95%CI 0.03-0.28), and APO-C1 (OR 5.76; 95%CI 2.66-12.47) were independent predictors of IS (AUC/c-statistic of 92%) (Table 3).

Comparing the diagnostic performance of prediction models

We compared the ROC curves of three prediction models using the Hanley and McNeil test. We observed that the combined model had significantly higher AUC compared to biomarker model (p=0.036) and clinical model (p<0.0001) (Figure 3).

Figure 3:
  • Download figure
  • Open in new tab
Figure 3:

A comparison between the ROC curves of Clinical prediction model (blue curve), Biomarker prediction model (red curve), and Combined prediction model (green curve) to differentiate IS from ICH within 24 hours.

The IDI index suggested that combined model increased the discrimination ability by 13% (p<0.001) compared to biomarker model and by 26% (p<0.001) compared to clinical model.

The goodness-of-fit between the clinical and combined models (hierarchical models) was compared using the likelihood ratio test. The addition of three protein biomarkers (GFAP, MMP-9, and APO-C1) to the clinical variables significantly increased the likelihood ratio by 102.74 (p<0.001). Further, combined model including clinical and biomarker variables, had the least AIC and BIC values than clinical model.

Comparison between prediction models at different pre-test probabilities

Combined model including clinical variables and protein biomarkers (GFAP, MMP-9, APO-C1), improved the pre-test probability of diagnosing an IS from 30%, 50%, and 70% to a post-test probability of 57%, 82%, and 95%, respectively. Combined model had the best sensitivity, specificity, and likelihood ratios at all the three pre-test probability cut-offs compared to clinical and biomarker models. At a pre-test probability of 30%, combined model had the highest NPV of 94% to rule out ICH (Table 4).

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 4:

Comparison in the diagnostic performance of the three prediction models at different pre-test probabilities

Discussion

In the discovery phase, our study identified 20 differentially expressed proteins between IS and ICH. Of these 20 proteins, in the validation phase, we identified three biomarkers (GFAP, MMP9, APOC1) that independently differentiated IS from ICH in multivariable analyses. When these three biomarkers were added to the prediction models containing clinical variables, they significantly improved the discrimination ability to detect IS. GFAP and MMP9 further differentiated the two stroke subtypes in a subgroup analysis within 6 hours. In a secondary analysis, we identified three biomarkers (APO-L1, BNP, FBXW5) that differentiated IS from stroke mimics. To the best of our knowledge, this is the first study that has utilized the discovery-based label-free SWATH-MS proteomics and targeted MRM proteomics to identify and validate differentially expressed protein biomarkers in the blood of patients with IS and ICH within 24 hours of symptom onset.

Our findings highlight the importance of a panel-based approach in diagnostic biomarker research in stroke. Our study observed that a combined panel of biomarkers and clinical variables detected IS from ICH with an accuracy of 92%. Only a few studies in the past have adopted the panel-based approach. The Stroke-Chip study in 201732 included a panel of clinical variables and biomarkers in which only NT-proBNP independently differentiated IS from ICH with an accuracy of 75.7%. Earlier, Montaner et al. in 201233 observed that in a combined panel, S100B and RAGE independently differentiated IS from ICH with an improved accuracy of 84%. Thus, future studies must focus on testing the diagnostic potential of their biomarkers in multivariable prediction models to identify independent biomarkers for IS diagnosis.

For the first time our study used a label-free proteomics approach to detect the levels of proteins identified in the literature such as GFAP, MMP9, BNP, D-dimer, and UCH-L1 in stroke patients. We focussed on finding a combination of biomarkers with/without clinical variables which can detect IS with high sensitivity and a high NPV to rule out ICH. The poor specificity values observed in most of the biomarkers when analyzed alone, were improved significantly when combined in a panel.

Our network analysis identified an interaction network of 15 proteins that was highly connected. The most common significant pathways underlying the proteins differentiating IS from ICH included cholesterol metabolism and lipid-related processes, platelet degranulation, and pathways related to extracellular space and matrix.

Thus far, no proteomics study conducted in stroke has identified biomarkers within the 24-hour time window. In a recent study, Malicek et al. 202134 used a similar discovery-based label-free proteomics approach and identified nine proteins in a small sample of 3 IS and 4 ICH patients. They collected blood samples between 1 to 15 days and lacked a validation phase. Of these nine proteins, serum amyloid P-component was also differentially expressed in our discover phase. Another study by Zhang et al. 202135 collected serum within five days and identified two metabolic biomarkers between IS and ICH using targeted metabolomics. Lopez et al. (2012)36 used targeted proteomics and identified a combination of APO-CIII and APO-AI that differentiated IS from ICH with an AUC of 0.92. In our discovery phase, we also observed that APO-AI was significantly downregulated in IS by 0.61-fold compared to ICH.

Several studies in the past have used immunoassays to identify candidate protein biomarkers for differentiating IS from ICH. GFAP is the most extensively studied protein and several systematic reviews and meta-analyses,37–40 including those published by our group8,9,41 also reiterate our findings that GFAP efficiently differentiates the two stroke subtypes with high sensitivity and/or specificity within 24 as well as 6 hours. With regards to MMP9, our findings are in line with several other published studies33,42–44 and meta-analysis,9 that observed that MMP-9 has significantly lower levels in IS than ICH and efficiently differentiates the two stroke subtypes within 24 hours.

Previous studies have had contradictory results regarding the APO-C1 levels in IS and ICH patients. In our study, we detected higher levels of APO-C1 in IS than ICH. The increased levels of APO-C1 during acute IS might be related to its key role in lipid metabolism, also observed in the pathway analysis in our study. One study by Allard et al. (2004)45 corroborates our findings as they observed higher levels of APO-C1 in IS than ICH. However, a study by Walsh et al. (2016)46 observed that in a small sample of 14 IS and 23 ICH cases, the median APO-CI levels were slightly lower in IS compared to ICH. Therefore, more adequately powered studies are required to confirm the levels of APO-C1 in stroke subtypes.

Strengths of our study

This study provides crucial insights into the pathophysiology of IS and ICH in acute stages. We validated the diagnostic potential of discovered biomarkers in a large cohort of patients using a high-throughput proteomics approach. Our study was conducted in resource-limited settings wherein there is an urgent need for an alternative approach toward rapid stroke diagnosis. Our study analyzed the individual protein profile of each patient using proteomics in both discovery and validation phases without pooling any sample. Since, pooled samples do not reflect the diseased/non-diseased condition of a single person and may produce false-positive or false-negative results.

Limitations of the study

Certain inherent limitations associated with our study must be kept in mind when interpreting its results. Firstly, the serum samples were obtained only from the Indian population; thus, further studies conducted in ethnically diverse populations are required to validate the generalizability of the biomarkers identified in our study. Secondly, relative quantification of proteins was done in the validation phase. To determine the biomarker levels with absolute quantification, future studies must be conducted on these biomarkers using either labelled peptide-based targeted proteomics or standard immunoassays. Lastly, the number of stroke mimics included in this study were very less, therefore, findings were only exploratory.

Future directions

The literature on proteomics studies for diagnostic biomarkers in stroke is mainly limited to the discovery phase, while the validation phase is lacking. Therefore, studies must look to validate their discovered biomarkers in a large independent cohort using either standard immunoassays or targeted proteomics approaches using MRM. Since stroke is a multifactorial disorder, a single biomarker approach might not be appropriate, and instead studies must look for a panel of biomarkers that may increase the overall sensitivity and specificity of the diagnosis. The ability of the biomarkers to independently diagnose IS must be evaluated in multivariable regression models consisting of potential clinical predictors along with independent protein biomarkers. Studies should also examine the improvement in the discrimination ability and the incremental value of any biomarker over clinical variables using various statistical approaches. Future studies should utilize the advanced label-free proteomics technology for biomarker research in stroke. They require a minimal sample volume at a low cost per sample. This approach would be convenient for studies investigating a panel of biomarkers. Studies must avoid pooling of samples when conducting their proteomics experiments to avoid false positive/false negative results.

Conclusion

Our study identified 20 differentially expressed proteins between IS and ICH in the discovery phase using SWATH-MS proteomics with a strong interaction network for 15 proteins. The validation phase highlighted the potential of 15 biomarker candidates in differentiating IS from ICH within 24 hours. The prediction model including three protein biomarkers (GFAP, MMP9, and APOC1) and clinical variables, had the best discrimination ability and increased the post-test probability of detecting IS with high sensitivity, specificity, and likelihood ratios. Protein biomarkers along with clinical predictors significantly improved the discrimination ability of the model in differentiating IS from ICH within 24 hours. Our results must be validated in other populations using absolute quantification to confirm the generalizability of our findings and the diagnostic performance of the identified biomarkers.

Contributors

DV conceptualized the idea of this research topic, helped design the clinical methodology, supervised each step of execution of this study. SM primarily conducted each step of this study ranging from blood sample collection, processing, proteomics experimentation, statistical and proteomic data analysis, results interpretation, and manuscript writing. SSG supervised the proteomics experimentation and its data analysis. SM and PS conducted the proteomics experiments and data analysis. MK, ZR, DB, and RC contributed in conducting the proteomics experimentations in the study. PT and MN contributed in patient sample collection and processing. PK and AK helped in statistical data analysis. DV, PA, AKS, AKP, DM, and KP aided in patient recruitment for this study. All authors contributed to, reviewed, and approved the final draft of the paper.

Disclosures

The authors report no disclosures relevant to the manuscript.

Data Availability

All proteomics data produced are available online at ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD032917.

Study Funding

This study was supported in part by the AIIMS Intramural Research Grant (F. No. 8-762/A-762/2019/RS).

Ethics Approval

The study was approved by the local institutional ethics committee of AIIMS, New Delhi (Ref. No. IECPG-395/28.09.2017).

Acknowledgement

S. Misra is a DST-INSPIRE Fellow supported by Department of Science and Technology, Government of India.

Glossary

IS
Ischemic Stroke
ICH
Intracerebral hemorrhage
SDE
Significantly Differentially Expressed
CT
Computed Tomography
MRI
Magnetic Resonance Imaging
SWATH-MS
sequential windowed acquisition of all theoretical fragment ion mass spectra
MRM
Multiple Reaction Monitoring
FDR
False Discovery Rate
PCA
Principal Component Analysis
OR
Odds Ratio
CI
Confidence Interval
ROC
Receiver Operating Characteristic
AUC
Area Under the Curve
PPV
Positive Predictive Value
NPV
Negative Predictive Value
VIF
Variance Inflation Factor
IDI
Integrated Discrimination Improvement
GO
Gene Ontology
KEGG
Kyoto Encyclopedia of Genes and Genomes.

References

  1. 1.↵
    Maas WJ, Lahr MMH, Buskens E, et al. Pathway Design for Acute Stroke Care in the Era of Endovascular Thrombectomy: A Critical Overview of Optimization Efforts. Stroke 2020; 51: 3452–3460.
    OpenUrlPubMed
  2. 2.↵
    IST-3 collaborative group, Sandercock P, Wardlaw JM, et al. The benefits and harms of intravenous thrombolysis with recombinant tissue plasminogen activator within 6 h of acute ischaemic stroke (the third international stroke trial [IST-3]): a randomised controlled trial. Lancet 2012; 379: 2352–2363.
    OpenUrlCrossRefPubMedWeb of Science
  3. 3.
    Hacke W, Kaste M, Bluhmki E, et al. Thrombolysis with alteplase 3 to 4.5 hours after acute ischemic stroke. N Engl J Med 2008; 359: 1317–1329.
    OpenUrlCrossRefPubMedWeb of Science
  4. 4.↵
    Emberson J, Lees KR, Lyden P, et al. Effect of treatment delay, age, and stroke severity on the effects of intravenous thrombolysis with alteplase for acute ischaemic stroke: a meta-analysis of individual patient data from randomised trials. Lancet 2014; 384: 1929–1935.
    OpenUrlCrossRefPubMedWeb of Science
  5. 5.↵
    Nogueira RG, Jadhav AP, Haussen DC, et al. Thrombectomy 6 to 24 Hours after Stroke with a Mismatch between Deficit and Infarct. N Engl J Med 2018; 378: 11–21.
    OpenUrlCrossRefPubMed
  6. 6.↵
    Albers GW, Marks MP, Kemp S, et al. Thrombectomy for Stroke at 6 to 16 Hours with Selection by Perfusion Imaging. N Engl J Med 2018; 378: 708–718.
    OpenUrlCrossRefPubMed
  7. 7.↵
    Whiteley W, Tseng M-C, Sandercock P. Blood biomarkers in the diagnosis of ischemic stroke: a systematic review. Stroke 2008; 39: 2902–2909.
    OpenUrlAbstract/FREE Full Text
  8. 8.↵
    Misra S, Kumar A, Kumar P, et al. Blood-based protein biomarkers for stroke differentiation: A systematic review. Proteomics Clin Appl; 11. Epub ahead of print September 2017. DOI: 10.1002/prca.201700007.
    OpenUrlCrossRef
  9. 9.↵
    Misra S, Montaner J, Ramiro L, et al. Blood biomarkers for the diagnosis and differentiation of stroke: A systematic review and meta-analysis. Int J Stroke 2020; 15: 704–721.
    OpenUrl
  10. 10.↵
    Jylhä A, Nättinen J, Aapola U, et al. Comparison of iTRAQ and SWATH in a clinical study with multiple time points. Clin Proteomics 2018; 15: 24.
  11. 11.↵
    Rl S, Se K, Jp B, et al. An updated definition of stroke for the 21st century: a statement for healthcare professionals from the American Heart Association/American Stroke Association. Stroke; 44. Epub ahead of print July 2013. DOI: 10.1161/STR.0b013e318296aeca.
    OpenUrlAbstract/FREE Full Text
  12. 12.↵
    Wilmoth GH. Handbook in research and evaluation, Second Edition Stephen Isaac and William B. Michael San Diego, CA: EDITS Pubs., 1981. Group & Organization Studies 1982; 7: 124–126.
    OpenUrl
  13. 13.↵
    Julious SA. Sample size of 12 per group rule of thumb for a pilot study. Pharmaceutical Statistics 2005; 4: 287–291.
    OpenUrl
  14. 14.↵
    Prasad K. Sample size calculation with simple math for clinical researchers. Natl Med J India 2020; 33: 372–374.
    OpenUrl
  15. 15.↵
    Misra S, Singh P, Nath M, et al. Blood-based protein biomarkers for the diagnosis of acute stroke: A discovery-based SWATH-MS proteomic approach. Front Neurol 2022; 13: 989856.
  16. 16.↵
    Szklarczyk D, Gable AL, Nastou KC, et al. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res 2021; 49: D605–D612.
    OpenUrlCrossRefPubMed
  17. 17.↵
    Shannon P, Markiel A, Ozier O, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003; 13: 2498–2504.
    OpenUrlAbstract/FREE Full Text
  18. 18.↵
    Seymour SL, Hunter CL. ProteinPilotTM Software overview. 5.
  19. 19.↵
    Desiere F, Deutsch EW, King NL, et al. The PeptideAtlas project. Nucleic Acids Res 2006; 34: D655–658.
    OpenUrlCrossRefPubMedWeb of Science
  20. 20.↵
    Wilkins MR, Gasteiger E, Bairoch A, et al. Protein identification and analysis tools in the ExPASy server. Methods Mol Biol 1999; 112: 531–552.
    OpenUrlCrossRefPubMed
  21. 21.↵
    Zhang H, Liu Q, Zimmerman LJ, et al. Methods for peptide and protein quantitation by liquid chromatography-multiple reaction monitoring mass spectrometry. Mol Cell Proteomics 2011; 10: M110.006593.
    OpenUrlAbstract/FREE Full Text
  22. 22.↵
    MacLean B, Tomazela DM, Shulman N, et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 2010; 26: 966–968.
  23. 23.↵
    Stekhoven DJ, Bühlmann P. MissForest--non-parametric missing value imputation for mixed-type data. Bioinformatics 2012; 28: 112–118.
    OpenUrlCrossRefPubMedWeb of Science
  24. 24.↵
    Bossuyt PM, Reitsma JB, Bruns DE, et al. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. BMJ 2015; 351: h5527.
  25. 25.↵
    Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 1983; 148: 839–843.
    OpenUrlCrossRefPubMedWeb of Science
  26. 26.↵
    Pencina MJ, D’Agostino RB, D’Agostino RB, et al. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 2008; 27: 157–172; discussion 207-212.
    OpenUrlCrossRefPubMedWeb of Science
  27. 27.↵
    Pickering JW, Endre ZH. New metrics for assessing diagnostic potential of candidate biomarkers. Clin J Am Soc Nephrol 2012; 7: 1355–1364.
    OpenUrlAbstract/FREE Full Text
  28. 28.↵
    Dumonceaux R, Antle CE, Haas G. Likelihood Ratio Test for Discrimination between Two Models with Unknown Location and Scale Parameters. Technometrics 1973; 15: 19–27.
    OpenUrl
  29. 29.↵
    Akaike H. A New Look at the Bayes Procedure. Biometrika 1978; 65: 53–59.
    OpenUrlCrossRefWeb of Science
  30. 30.↵
    Gideon Schwarz. Estimating the Dimension of a Model. The Annals of Statistics 1978; 6: 461–464.
    OpenUrl
  31. 31.↵
    Perez-Riverol Y, Csordas A, Bai J, et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res 2019; 47: D442–D450.
    OpenUrlCrossRefPubMed
  32. 32.↵
    Bustamante A, López-Cancio E, Pich S, et al. Blood Biomarkers for the Early Diagnosis of Stroke: The Stroke-Chip Study. Stroke 2017; 48: 2419–2425.
    OpenUrlAbstract/FREE Full Text
  33. 33.↵
    Montaner J, Mendioroz M, Delgado P, et al. Differentiating ischemic from hemorrhagic stroke using plasma biomarkers: the S100B/RAGE pathway. J Proteomics 2012; 75: 4758–4765.
    OpenUrlPubMed
  34. 34.↵
    Malicek D, Wittig I, Luger S, et al. Proteomics-Based Approach to Identify Novel Blood Biomarker Candidates for Differentiating Intracerebral Hemorrhage From Ischemic Stroke-A Pilot Study. Front Neurol 2021; 12: 713124.
  35. 35.↵
    Zhang J, Su X, Qi A, et al. Metabolomic profiling of fatty acid biomarkers for intracerebral hemorrhage stroke. Talanta 2021; 222: 121679.
  36. 36.↵
    Lopez MF, Sarracino DA, Prakash A, et al. Discrimination of ischemic and hemorrhagic strokes using a multiplexed, mass spectrometry-based assay for serum apolipoproteins coupled to multi-marker ROC algorithm. Proteomics Clin Appl 2012; 6: 190–200.
    OpenUrlPubMed
  37. 37.↵
    Perry LA, Lucarelli T, Penny-Dimri JC, et al. Glial fibrillary acidic protein for the early diagnosis of intracerebral hemorrhage: Systematic review and meta-analysis of diagnostic test accuracy. Int J Stroke 2019; 14: 390–399.
    OpenUrlPubMed
  38. 38.
    Zhang J, Zhang C-H, Lin X-L, et al. Serum glial fibrillary acidic protein as a biomarker for differentiating intracerebral hemorrhage and ischemic stroke in patients with symptoms of acute stroke: a systematic review and meta-analysis. Neurol Sci 2013; 34: 1887–1892.
    OpenUrl
  39. 39.
    Sun Y, Qin Q, Shang Y-J, et al. The accuracy of glial fibrillary acidic protein in acute stroke differential diagnosis: A meta-analysis. Scand J Clin Lab Invest 2013; 73: 601– 606.
    OpenUrlCrossRefPubMed
  40. 40.↵
    Cabezas JA, Bustamante A, Giannini N, et al. Discriminative value of glial fibrillar acidic protein (GFAP) as a diagnostic tool in acute stroke. Individual patient data meta-analysis. J Investig Med 2020; 68: 1379–1385.
    OpenUrlAbstract/FREE Full Text
  41. 41.↵
    Kumar A, Misra S, Yadav AK, et al. Role of glial fibrillary acidic protein as a biomarker in differentiating intracerebral haemorrhage from ischaemic stroke and stroke mimics: a meta-analysis. Biomarkers 2020; 25: 1–8.
    OpenUrl
  42. 42.↵
    Montaner J, Mendioroz M, Ribó M, et al. A panel of biomarkers including caspase-3 and D-dimer may differentiate acute stroke from stroke-mimicking conditions in the emergency department. J Intern Med 2011; 270: 166–174.
    OpenUrlCrossRefPubMed
  43. 43.
    Kavalci C, Genchallac H, Durukan P, et al. Value of biomarker-based diagnostic test in differential diagnosis of hemorrhagic-ischemic stroke. Bratisl Lek Listy 2011; 112: 398–401.
    OpenUrlPubMed
  44. 44.↵
    Kim MH, Kang SY, Kim MC, et al. Plasma biomarkers in the diagnosis of acute ischemic stroke. Ann Clin Lab Sci 2010; 40: 336–341.
    OpenUrlAbstract/FREE Full Text
  45. 45.↵
    Allard L, Lescuyer P, Burgess J, et al. ApoC-I and ApoC-III as potential plasmatic markers to distinguish between ischemic and hemorrhagic stroke. Proteomics 2004; 4: 2242–2251.
    OpenUrlCrossRefPubMedWeb of Science
  46. 46.↵
    Walsh KB, Hart K, Roll S, et al. Apolipoprotein A-I and Paraoxonase-1 Are Potential Blood Biomarkers for Ischemic Stroke Diagnosis. J Stroke Cerebrovasc Dis 2016; 25: 1360–1365.
    OpenUrl
Back to top
PreviousNext
Posted June 12, 2023.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Subtyping strokes using blood-based biomarkers: A proteomics approach
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Subtyping strokes using blood-based biomarkers: A proteomics approach
Shubham Misra, Praveen Singh, Shantanu Sengupta, Manoj Kushwaha, Zuhaibur Rahman, Divya Bhalla, Pumanshi Talwar, Manabesh Nath, Rahul Chakraborty, Pradeep Kumar, Amit Kumar, Praveen Aggarwal, Achal K Srivastava, Awadh K Pandit, Dheeraj Mohania, Kameshwar Prasad, Deepti Vibha
medRxiv 2023.06.10.23291233; doi: https://doi.org/10.1101/2023.06.10.23291233
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Subtyping strokes using blood-based biomarkers: A proteomics approach
Shubham Misra, Praveen Singh, Shantanu Sengupta, Manoj Kushwaha, Zuhaibur Rahman, Divya Bhalla, Pumanshi Talwar, Manabesh Nath, Rahul Chakraborty, Pradeep Kumar, Amit Kumar, Praveen Aggarwal, Achal K Srivastava, Awadh K Pandit, Dheeraj Mohania, Kameshwar Prasad, Deepti Vibha
medRxiv 2023.06.10.23291233; doi: https://doi.org/10.1101/2023.06.10.23291233

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Neurology
Subject Areas
All Articles
  • Addiction Medicine (349)
  • Allergy and Immunology (668)
  • Allergy and Immunology (668)
  • Anesthesia (181)
  • Cardiovascular Medicine (2648)
  • Dentistry and Oral Medicine (316)
  • Dermatology (223)
  • Emergency Medicine (399)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (942)
  • Epidemiology (12228)
  • Forensic Medicine (10)
  • Gastroenterology (759)
  • Genetic and Genomic Medicine (4103)
  • Geriatric Medicine (387)
  • Health Economics (680)
  • Health Informatics (2657)
  • Health Policy (1005)
  • Health Systems and Quality Improvement (985)
  • Hematology (363)
  • HIV/AIDS (851)
  • Infectious Diseases (except HIV/AIDS) (13695)
  • Intensive Care and Critical Care Medicine (797)
  • Medical Education (399)
  • Medical Ethics (109)
  • Nephrology (436)
  • Neurology (3882)
  • Nursing (209)
  • Nutrition (577)
  • Obstetrics and Gynecology (739)
  • Occupational and Environmental Health (695)
  • Oncology (2030)
  • Ophthalmology (585)
  • Orthopedics (240)
  • Otolaryngology (306)
  • Pain Medicine (250)
  • Palliative Medicine (75)
  • Pathology (473)
  • Pediatrics (1115)
  • Pharmacology and Therapeutics (466)
  • Primary Care Research (452)
  • Psychiatry and Clinical Psychology (3432)
  • Public and Global Health (6527)
  • Radiology and Imaging (1403)
  • Rehabilitation Medicine and Physical Therapy (814)
  • Respiratory Medicine (871)
  • Rheumatology (409)
  • Sexual and Reproductive Health (410)
  • Sports Medicine (342)
  • Surgery (448)
  • Toxicology (53)
  • Transplantation (185)
  • Urology (165)