Abstract
Background Biomarkers would greatly assist chronic pain management. The present study aimed to undertake analytical validation of a sensorimotor cortical biomarker signature for pain consisting of two measures: sensorimotor peak alpha frequency (PAF) and corticomotor excitability (CME), using a human model of prolonged temporomandibular pain (masseter intramuscular injection of nerve growth factor [NGF]).
Methods 150 participants received an injection of NGF to the right masseter muscle on Days 0 and 2, inducing prolonged pain lasting up to 4 weeks. Electroencephalography (EEG) to assess PAF and transcranial magnetic stimulation (TMS) to assess CME were recorded on Days 0, 2 and 5. We determined the predictive accuracy of the PAF/CME biomarker signature using a nested control-test scheme: machine learning models were run on a training set (n = 100), where PAF and CME were predictors and pain sensitivity was the outcome. The winning classifier was assessed on a test set (n = 50) comparing the predicted pain labels against the true labels.
Results The winning classifier was logistic regression, with an outstanding area under the curve (AUC=1.00). The locked model assessed on the test set had excellent performance (AUC=0.88). Results were reproduced across a range of methodological parameters and inclusion of covariates in the modelling. PAF and CME biomarkers showed good-excellent test-retest reliability.
Conclusions This study provides evidence for a sensorimotor cortical biomarker signature for an episode of prolonged pain. The combination of accuracy, reproducibility, and reliability, suggests the PAF/CME biomarker signature has substantial potential for clinical translation.
Several objective pain biomarkers have been proposed, including neuroimaging markers of mechanistic/structural abnormalities [1-4] and “multi-omics” metrics of micro RNA [5], proteins [6], lipids and metabolites [7]. Such biomarkers would greatly assist decision making in the diagnosis, prevention and treatment of chronic pain [8]. However, attempts at establishing pain biomarkers have suffered from either insufficient sample sizes to conduct full-scale analytical validation using machine learning [8-10], failure to use clinically relevant pain models [11-13] or lack of assessment of reproducibility or test-retest reliability [14, 15]. These factors have hindered the clinical translatability of prospective pain biomarkers.
Recent evidence shows promise for a sensorimotor cortical biomarker signature for predicting the severity of a prolonged pain episode. The biomarker signature reflects individual differences in ascending sensory and descending motor processing, comprising two metrics: 1) sensorimotor peak alpha frequency (PAF), defined as the dominant sensorimotor cortical oscillation in the alpha (8-12Hz) range [16], and is related to the efficiency in which the brain can inhibit incoming sensory input [17, 18], and 2) corticomotor excitability (CME), defined as the efficacy at which signals are relayed from primary motor cortex (M1) to peripheral muscles [19]. CME is altered during pain as individuals adopt different movement strategies to cope with pain [20, 21]. Previous work has shown that slower PAF prior to pain onset and reduced CME during prolonged pain (“depression”) are associated with more pain, while faster PAF and increased CME (“facilitation”) are associated with less pain [21-25]). Given individuals who experience higher pain in the early stages of a prolonged pain episode (e.g. post-surgery) are more likely to develop chronic pain in the future [26], slow PAF prior to an anticipated prolonged pain episode and/or CME depression during the acute stages of pain are potential predictors for the transition to chronic pain.
This paper presents the main outcomes of the PREDICT trial, a pre-registered (NCT04241562, [27]) full-scale analytical validation of the PAF/CME biomarker signature using a human model of prolonged myofascial temporomandibular pain (masseter intramuscular injection of nerve growth factor [NGF]). Repeated NGF injections induce progressively developing prolonged pain lasting up to 4 weeks [25, 28], and has been shown to mimic chronic pain characteristics such as time course (gradual development), type of pain (movement-evoked), functional impairments, hyperalgesia (increased pressure pain thresholds) and mechanism of sensitization [29, 30]. This makes the NGF model a highly standardised and clinically relevant prolonged pain model with which to undertake biomarker validation.
The aim of the PREDICT trial was to determine whether individuals could be accurately classified as high or low pain sensitive based on baseline PAF and early CME facilitator/depressor classification. We predicted the area under the curve (AUC) of the receiver operator characteristic (ROC) curve for distinguishing high and low pain sensitive individuals would be at least 70% (which represents an acceptable AUC) [31].
Methods
Participants
The PREDICT trial enrolled 159 healthy participants (70 females, 89 males, mean age 25.1 ± 6.1), with 150 participants remaining after participant dropouts. Ethical approval was obtained from the University of New South Wales (HC190206) and the University of Maryland Baltimore (HP-00085371). Written, informed consent was obtained. The supplementary appendix contains all additional details regarding participant characteristics and methodology.
Experimental Protocol
Outcomes were collected over a period of 30 days. Participants attended the laboratory on Day 0, 2, and 5. Baseline questionnaire data were collected on Day 0. Pressure pain thresholds, PAF and CME were measured on Day 0, 2 and 5. PAF was obtained via a 5-minute eyes-closed resting-state EEG recording from 63 scalp electrodes. Sensorimotor PAF was computed by identifying the component in the signal (transformed by independent component analysis) that had a clear alpha peak (8-12Hz) upon frequency decomposition and a scalp topography suggestive of a sensorimotor source. CME was obtained using transcranial magnetic stimulation (TMS) mapping; single pulses of TMS delivered to the left primary motor cortex (M1), and motor evoked potentials (MEPs) recorded from the right masseter muscle using electromyography (EMG) electrodes. TMS was delivered at each site on a 1cm-spaced grid superimposed over the scalp, and a map of the corticomotor representation of the masseter muscle was generated. Corticomotor excitability was indexed as map volume, which is calculated by summing MEP amplitudes from all “active sites” on the grid. NGF was injected into the right masseter muscle at the end of the Day 0 and 2 laboratory sessions. Electronic pain diaries were collected from Days 1 to 30 at 10am and 7pm each day, where participant rated their pain (0-10) during various activities. Pain upon functional jaw movement is a key criterion for the diagnosis of TMD [32]. Moreover, previous research has shown that, after an NGF injection to the masseter muscle, pain during chewing and yawning are higher compared to other activities [30, 33]. As such, the primary outcomes were pain upon chewing and yawning. The protocol and methodology are shown in Figure 1A and 1B.
(A) Experimental protocol showing timeline of data collection procedures. On Day 0, we measured peak alpha frequency (PAF) and corticomotor excitability (CME). At the end of the session, an injection of nerve growth factor (NGF) was administered to the right masseter muscle. On Day 2, PAF and CME were measured, followed by a second NGF injection. On Day 5, PAF and CME were measured. From Days 1-30, electronic diaries measuring jaw pain were sent to participants at 10AM and 7PM each day. (B) Details of the methodology. Sensorimotor PAF was measured using a 5 minutes eyes closed resting state EEG recording. Sensorimotor PAF was computed by identifying the component in the signal (transformed by independent component analysis) that had a clear alpha peak in the 8–12 Hz range upon frequency decomposition and a scalp topography suggestive of a source predominately over the sensorimotor cortex. TMS mapping was conducted by stimulating the scalp area over left M1 to obtain a map of the representation of the right masseter muscle. The map consists of the motor-evoked potential (MEP) amplitude at each stimulated location, with CME corresponding to the map volume (sum of all MEPs from active sites). (C) Details of the analysis plan. We adopted a nested-control-test scheme by partitioning the 150 subjects into a training set consisting of 100 subjects and an independent test set of 50 subjects. We labelled a subset of participants in the training (n = 80) and test set (n = 38) as high or low pain sensitive using growth mixture modelling (GMM) to establish “ground-truth” labels. We then ran various machine learning models on the labelled training set (with PAF/CME as predictors, and pain severity labels as outcome), and determined optimized parameters through 5-fold cross-validation i.e. randomly dividing the 80 subjects into an internal training set of 64 subjects (with 4 equal folds of 16) and a validation set of 16. The optimized models in the internal training set were employed to predict labels in the validation set to facilitate model selection. The model with the best performance on the validation set was then locked in, and applied to the labelled test set, comparing the predicted labels of high/low pain sensitive with the ground-truth labels of high/low pain sensitive.
Analytical Validation Plan
Division of the Data
Analysis was conducted in R, MATLAB and Python, with code publicly available https://github.com/DrNahianC/PREDICT_Scripts. Figure 1C details the analysis plan. We adopted a nested-control-testing scheme by partitioning 150 participants into a training and test set of 100 and 50 participants respectively.
Growth Mixture Modelling
We used growth mixture modelling (GMM) in R [34-36] to form two participant classes: high and low pain sensitive. For this categorization, we used the sum of pain upon chewing and yawning data, and pain diary trajectories from Days 1-7 for the classification, as this was the timeframe when pain was most prominent (Supplementary Figure 3). As such, participants would more reliably fall into high and low pain sensitive classes during this timeframe. The first and last 40 participants (80 in total) in the training set, based on the ordering of probabilities of the pain intensity trajectory belonging to one of the classes, were then labelled as high and low pain sensitive. The trained GMM model, once established, was locked and utilized to label the test set. Consequently, 38 out of 50 test set participants (24 high and 14 low pain) were labelled. These labels were recorded for subsequent comparison with the predicted labels produced by the trained machine learning model.
Machine Learning Model Selection and Fine Tuning
We utilized five machine learning models on the labelled training set —logistic regression, random forest, gradient boosting, support vector machine, and neural network. The dependent variable was pain sensitivity label (high or low) identified from the GMM and independent variables were sensorimotor PAF and CME: the latter was typified as facilitator and depressor, depending on whether they showed an increase or decrease in map volume on Day 5 relative to Day 0, respectively. For each model, we identified optimized parameters through 5-fold cross-validation: we randomly divided the 80 participants into an internal training set of 64 participants (consisting of four equal folds of 16) and a validation set of 16. The optimized models in the internal training set were then employed to predict labels in the validation set to facilitate model selection. The model with the best performance (area under the curve) on the validation set was then locked in.
Test Set Prediction
The locked machine learning model was assessed on the test set. The participant IDs in this set did not coincide with those in the pain diary data, thereby preserving the double-blind nature of the analysis. By using the ground truth labels (shuffled), predicted labels (unshuffled), and the shuffling order for the test set, we were able to evaluate the model’s performance by comparing the reordered predicted labels against the ground truth labels established by the GMM. Performance was assessed via receiver operating characteristic (ROC) area under the curve (AUC), with 95% confidence intervals reported. AUC values between 0.7-0.8, 0.8-0.9 and 0.9-1 were considered “acceptable”, “excellent”, and “outstanding” respectively [31].
Results
PAF/CME demonstrated good-excellent test-retest reliability
PAF and ΔCME showed good to excellent test-retest reliability across sessions (Supplementary Figures 5 and 7).
Outstanding performance on the training validation set
Figure 2A shows the pain scores for participants in the training and test set classified as high and low pain sensitive based on GMM. Figure 2B (upper) shows the performances of the machine learning models across the internal training and validation sets. Logistic regression was chosen as the optimal classifier based on its outstanding performance (AUC=1.00[1.00-1.00]) when applied to the validation set (Figure 2B lower), with slower PAF and CME depression predicting higher pain.
(A) Results of the growth mixture modelling which categorized 80 participants in the training set (left) and 38 participants in the test (right) as high or low pain sensitive. Data shows mean pain score (chew + yawn pain rating) for each timepoint, while the shaded area shows 95% confidence intervals. (B) The upper panel shows performances (AUC [95% confidence intervals]) of various machine learning models for the internal training set and validation set. Logistic regression (LR) was chosen as the optimal classifier based on outstanding AUC of 100% as shown in the lower panel. (C) The upper panel shows the performance of the locked logistic regression model when applied to the test set, which was in the excellent range (AUC of 88%). The lower panel shows the pain trajectories (mean chew + yawn pain and 95% confidence intervals) of participants predicted to have high or low pain sensitivity based on the locked logistic regression model. (D) Individual and mean z-transformed spectral plots and topography of the sensorimotor alpha component on Day 0 for participants predicted to have high or pain sensitivity based on the locked logistic regression model. (E) The mean motor cortex maps on Day 0 and Day 5 showing normalized motor evoked potential (MEP) amplitude (expressed as a proportion of the maximal MEP amplitude) for participants predicted to have high or low pain sensitivity based on the locked logistic regression model.
Excellent performance on the test set
When the locked logistic regression model was applied to the test set, performance (Figure 2C upper) was excellent (AUC=0.88[0.78-0.99]). Figure 2C (lower) shows the differences in pain scores between participants predicted to have high or low pain. Visually one can observe slower peak alpha frequency in those predicted to have high vs. low pain sensitivity (Figure 2D), which was further confirmed with a two-sample t-test (t(48)=5.8, p<.001). Moreover, one can observe a decrease in CME within the masseter motor maps in those predicted to have high pain (Figure 2E), whereas those predicted to have low pain exhibit an increase in CME. The differences in the change in CME relative to Day 0 between these groups was further confirmed with a two-sample t-test (t(48)=2.81, p=.007).
A benefit for a combined signature
We reran the models to determine whether the combined PAF/CME signature out-performed each measure individually (Supplementary Figure 9). For PAF alone, the performance of the logistic regression model on the training validation and test set were respectively excellent (AUC=0.95[0.84-1.00]) and outstanding (AUC=0.83[0.70-0.96]). For CME alone, the performance of the logistic regression model for the training validation and test set were respectively excellent (AUC=0.88[0.69-1.00]) and acceptable (AUC=0.75[0.60-0.91]).
Results were reproducible when including covariates
We evaluated the performance of the biomarker combined with demographic and clinical attributes. As we collected a large amount of this data, we applied feature selection, i.e. filtering features by inspecting p-values when associating predictors and labels, and using parameter tuning to optimize the coefficients associated with the filtered features. Five features were subsequently selected and optimized – Sensorimotor PAF, CME, Sex, Pain Catastrophizing Scale (PCS) Total and PCS Helplessness. The associations between labels and biomarkers/covariates in the training vs. test set, and performance of the models are shown in Figure 3A and 3B. When including these five features, the performance of the logistic regression model was for the training validation and test set were respectively outstanding (AUC=1.00[1.00-1.00]) and excellent (AUC=0.81[0.67-0.95]).
(A) Visualisation of biomarkers and covariates for the training and test sets across high (red) and low (blue) pain labels identified from the GMM. Data on PAF, PCS total and PCS helplessness are plotted as boxplots, while data on CME and Sex are plotted according facilitator: depressor (Fac: dep) and female: male (fem: mal) split respectively, including odd ratios. A lower odds ratio means a lower probability of high pain sensitive individuals belonging to the facilitator or female categories. For PAF and CME, low pain was associated with fast PAF and CME facilitation for both training and test sets. In contrast, the relationship between covariates and labels were in the opposite direction for the training and test set, suggesting the relationship between biomarkers and labels was consistent. (B) The left panel shows the performance of the locked logistic regression model on the test set when including covariates in the model. The right panel shows pain trajectories (mean chew + yawn score and 95% confidence intervals) of participants predicted to have high or low pain sensitivity based on the locked logistic regression model including covariates. (C) The performance of each machine learning model (AUC [95% confidence intervals]) on the training validation set across different PAF/CME calculation methods. This includes the sensorimotor component chosen manually after an independent component analysis, component identified using an automated script after an independent component analysis, or using a sensorimotor region of interest (ROI, mean of Cz, C3 and C4) in electrode space, to calculate PAF. We also looked at different frequency windows for computing PAF (8-12Hz vs. 9-11Hz) or CME calculated using map area or map volume. (D) The performance of the locked logistic regression model (AUC [95% confidence intervals]) when applied to the test set, across different PAF/CME calculation methods.
Results were reproducible across methodological choices
To determine whether our results were robust across different methodological choices, we repeated the analysis using PAF calculated using component level data (with the sensorimotor component chosen manually or using an automated script) vs. sensor level data (with a sensorimotor region of interest), using different frequency windows (8-12Hz vs. 9-11Hz) and using different CME calculation methods (map volume vs. map area). We found that, regardless of the choices, logistic regression was the best or equal-best performing model when applied to the validation set (Figure 3C), with AUCs varying from acceptable (0.77) to outstanding (1.00). When the locked models were applied to the test set, performance varied from acceptable (AUC=0.73) to excellent (AUC=0.88) (Figure 3D). Lastly, excellent performance was demonstrated when the data was analysed two other ways (Supplementary Figure 10 and 11): where GMM pain labels were established using the whole 30 days rather than the first 7 days (training validation AUC=0.84[0.64-1], test AUC =0.89[0.79-0.99]), and when missing pain diary data was not imputed (training validation AUC=0.81[0.6-1], test AUC=0.89[0.79-0.99]).
Discussion
A full-scale analytical validation of the PAF and CME biomarker signature was conducted using a clinically relevant prolonged pain model. In an initial training set (n=100), we found that a logistic regression was the optimal classifier based on its outstanding performance (AUC=100%), with slower PAF and CME depression predicting higher pain. When this model was applied to an independent test set, the AUC was excellent (88%). PAF/CME showed good-excellent test-retest reliability, and results were reproduced across a range of methodological parameters and consideration of covariates. Overall, the combination of sample size, pain model validity, and biomarker accuracy, reproducibility and reliability suggest the PAF/CME biomarker signature has substantial potential for clinical translation.
Our results suggest that individuals who have slow PAF prior to an anticipated prolonged pain episode and show corticomotor depression during a prolonged pain episode, are more likely to experience higher pain. Model performance was higher combining the two, suggesting consideration of both ascending sensory and descending motor pain processing mechanisms provides more information regarding pain sensitivity. Overall, we believe this biomarker could be particularly useful in contexts such as predicting post-operative pain. For example, a recent study showed that individuals with slower PAF experienced more pain following a thoracotomy [25]. Given that higher acute pain post-surgery predicts the development of chronic pain [26], our findings suggest individuals with slow PAF/reduced CME could be more likely transition to chronic pain. Indeed, individuals who show lower CME during the acute stages of low back pain were more likely to develop chronic pain at 6-months follow-up [37]. These preliminary findings, along with our analytical validation study, suggest PAF and CME could be susceptibility biomarkers for the transition from acute to chronic pain.
There are several aspects of our study which stand out within the field. The first is sample size: with recent advancements in machine learning, it has become possible to conduct analytical validation of pain biomarkers. However, deep learning requires a large amount of labelled samples to conduct rigorous training on validation and test sets [8]. Unfortunately, many pain susceptibility biomarker studies have not been sufficiently sampled to adopt such approaches [9, 10], and the ones that have used machine learning failed to reach the sample sizes similar to that of the present study [1, 2].
Another strength of our findings is reproducibility. The majority of work has shown similar associations between higher pain and slower PAF [16, 22] and CME depression [21, 25] in models of upper limb pain. The present study replicated these results in a model of prolonged jaw pain, suggesting these associations hold across pain locations. It is important to note that some studies have not shown a negative relationship between PAF and pain sensitivity [38, 39] or a positive relationship between CME depression and pain sensitivity [33]. However, these studies were not sufficiently sampled to conduct analytical validation of the kind presented in this study. Nonetheless, the mixed findings could also arise from differences in methodological choices in the estimation of PAF e.g. frequency windows [39] and use of sensor vs. component space data [40] and estimation of CME e.g. map volume [21] vs. area [33]. For this reason, we repeated the main analysis using different methodological choices and found at least acceptable AUCs. In addition, we found that the inclusion of covariates such as pain catastrophizing and sex did not alter our results, further supporting the reproducibility of our results.
The PAF/CME measures demonstrated good-excellent reliability. Reliability is a highly desirable characteristic which assists in the widespread application of pain biomarkers [8]. We found in the present analysis, and previously [14], that participants exhibit stable PAF across days despite the presence of pain, and even when considering different methodological factors that may influence the reliability of PAF such as pre-processing pipeline, recording length and frequency window. Indeed, we found reliable PAF with a recording length as short as 2 minutes and minimal data pre-processing. We also showed that those who show CME depression on Day 2 are also likely to show CME depression on Day 5 (and vice versa for those who show CME facilitation). This finding was shown even when an automated method of determining MEP amplitude on each trial was applied. Thus, our work not only shows that PAF and CME can predict pain, but the relative ease with which reliable PAF/CME data can be obtained is promising for subsequent clinical translation.
Another strength of this study is the clinical relevance of our pain model, making clinical translation of the current findings highly feasible. While other pain biomarker studies have shown promising results, these studies were restricted to pain models utilizing transient painful stimuli lasting seconds to minutes [11-13]. The brief nature of the painful stimuli questions the external validity of these findings and limits generalizability to clinical populations. In contrast, the present study used a prolonged pain model lasting weeks. Several other studies have shown that injections of NGF to the neck, elbow or masseter muscles can mimic symptoms of clinical neck pain [41], chronic lateral epicondylalgia [29] and TMD [30] respectively. Thus, the observed relationships between PAF/CME and pain in the present study show promise in terms of clinical applicability.
Lastly, the PAF/CME biomarker demonstrated high performance. A previous study found that connectivity between medial prefrontal cortex and nucleus accumbens in 39 sub-acute low back pain patients (pain duration 6-12 weeks) could predict future pain persistence at ∼7, 29 and 54 weeks, with AUCs of 67-83% [1]. Another study on 24 sub-acute low back pain patients showed that white matter fractional anisotropy measures in the superior longitudinal fasciculus and internal capsule predicted pain persistence over the next year, with an AUC of 81% [2]. Though the present did not directly assess the transition to chronic pain, our AUCs of 100% (validation set) and 88% (test set) appear comparatively high. We therefore encourage future clinical studies to determine whether PAF/CME can predict the transition from acute to chronic pain.
Conclusions
A novel biomarker signature comprised of PAF and CME accurately and reliably distinguishes high and low pain sensitive individuals during prolonged jaw pain with an excellent AUC of 88% in an independent test set. No other pain biomarker study has shown this combination of biomarker accuracy, reproducibility, reliability and pain model validity, suggesting high potential for clinical translation.
Data Availability
All data produced in the present study are available upon reasonable request to the authors
Footnotes
↵* Co-first author
↵** Co-senior author
Data availability statement: The data supporting the findings of this study are available from the corresponding author upon reasonable request
Disclosures: This project was funded by the National Institutes of Health (R61 NS113269/NS/NINDS NIH HHS/United States). The authors have no conflicts to declare.