Early prognostication of overall survival for pediatric diffuse midline gliomas using MRI radiomics and machine learning: a two-center study ============================================================================================================================================ * Xinyang Liu * Zhifan Jiang * Holger R. Roth * Syed Muhammad Anwar * Erin R. Bonner * Aria Mahtabfar * Roger J. Packer * Anahita Fathi Kazerooni * Miriam Bornhorst * Marius George Linguraru ## ABSTRACT **Background** Diffuse midline gliomas (DMG) are aggressive pediatric brain tumors that are diagnosed and monitored through MRI. We developed an automatic pipeline to segment subregions of DMG and select radiomic features that predict patient overall survival (OS). **Methods** We acquired diagnostic and post-radiation therapy (RT) multisequence MRI (T1, T1ce, T2, T2 FLAIR) and manual segmentations from two centers of 53 (internal cohort) and 16 (external cohort) DMG patients. We pretrained a deep learning model on a public adult brain tumor dataset, and finetuned it to automatically segment tumor core (TC) and whole tumor (WT) volumes. PyRadiomics and sequential feature selection were used for feature extraction and selection based on the segmented volumes. Two machine learning models were trained on our internal cohort to predict patient 1-year survival from diagnosis. One model used only diagnostic tumor features and the other used both diagnostic and post-RT features. **Results** For segmentation, Dice score (mean [median]±SD) was 0.91 (0.94)±0.12 and 0.74 (0.83)±0.32 for TC, and 0.88 (0.91)±0.07 and 0.86 (0.89)±0.06 for WT for internal and external cohorts, respectively. For OS prediction, accuracy was 77% and 81% at time of diagnosis, and 85% and 78% post-RT for internal and external cohorts, respectively. Homogeneous WT intensity in baseline T2 FLAIR and larger post-RT TC/WT volume ratio indicate shorter OS. **Conclusions** Machine learning analysis of MRI radiomics has potential to accurately and non-invasively predict which pediatric patients with DMG will survive less than one year from the time of diagnosis to provide patient stratification and guide therapy. **KEY POINTS** * Automatic machine learning approach accurately predicts DMG survival from MRI * Homogeneous whole tumor intensity in baseline T2 FLAIR indicates worse prognosis * Larger post-RT tumor core/whole tumor volume ratio indicates worse prognosis **IMPORTANCE OF STUDY** Studies of pediatric DMG prognostication have relied on manual tumor segmentation from MRI, which is impractical and variable in busy clinics. We present an automatic imaging tool based on machine learning to segment subregions of DMG and select radiomic features that predict overall survival. We trained and evaluated our tool on multisequence, two-center MRIs acquired at the time of diagnosis and post-radiation therapy. Our methods achieved 77-85% accuracy for DMG survival prediction. The data-driven study identified that homogeneous whole tumor intensity in baseline T2 FLAIR and larger post-therapy tumor core/whole tumor volume ratio indicates worse prognosis. Our tool can increase the utility of MRI for predicting clinical outcome, stratifying patients into risk-groups for improved therapeutic management, monitoring therapeutic response with greater accuracy, and creating opportunities to adapt treatment. This automated tool has potential to be easily incorporated in multi-institutional clinical trials to provide consistent and repeatable tumor evaluation. KEYWORDS * diffuse midline glioma * magnetic resonance imaging * machine learning * overall survival * prognostication ## Introduction Diffuse midline gliomas (DMG), including diffuse intrinsic pontine gliomas (DIPG), are aggressive central nervous system pediatric tumors located in the brainstem and thalamus.1 As one of the most devastating pediatric cancers, DMG represents about 10–15% of all pediatric tumors of the central nervous system, with an estimated 300 new cases diagnosed annually in the USA.2 Most DMGs occur between the ages of 5 and 10 years, with a peak at 7 years.3 There is no curative therapy for DMG, and radiation therapy (RT) is the standard treatment with only transitory benefits.4 Despite numerous clinical trials of new agents and novel therapeutic approaches over the last decades,5 disease outcomes remain dismal with a median overall survival (OS) of less than 1 year, a 2-year OS rate of less than 10%,6 and a 5-year OS rate of less than 1%.7 Magnetic resonance imaging (MRI) is the standard noninvasive test for DMG diagnosis and monitoring of tumor response to therapy. Although pediatric DMGs have a diverse imaging appearance,8 MRI features have been used to predict H3K27M mutation status9 and correlate with patient prognosis.10-15 The features utilized in these studies were either low-dimensional image features10,11,13-15 or based on texture analysis.12 The statistical analyses that most of these studies relied on tend to identify inconsistent and inconclusive imaging biomarkers across different studies and datasets. For example, a study of 357 pediatric DIPGs demonstrated that although many MRI features, such as tumor size, enhancement and necrosis etc., were strongly associated with survival on univariable analysis, very few were significantly associated with survival on multivariable analysis.11 These findings suggest that only relying on statistical analysis of conventional MRI findings may not be sufficient to predict OS in DMGs. Machine learning has shown great potential to predict survival or discriminate between certain groups in studies of other brain tumors such as glioblastoma multiforme (GBM) and pediatric low-grade gliomas.16-19 For DMG, machine learning-based regression models were proposed to correlate with patient prognosis based on extracted MRI radiomic features.20,21 These studies only focused on imaging data at diagnosis, and the tumors were segmented manually, which is generally believed to be time-consuming and to have high inter-operator variability. Other studies demonstrated that semiautomated DMG volume measurements are more accurate, prognostically relevant, and consistent than manual measurements.14,15 In addition to diagnostic scans, it is also important to consider longitudinal data at post-treatment timepoints.10 With new therapeutic strategies currently under investigation for DMG, including epigenetic therapy and immunotherapy,22 there is a great need for noninvasive prognostic imaging tools that can be universally used to accurately identify which patients are at risk for the most rapid deterioration, and thereby assist clinical trial eligibility and therapy planning. Such tools should be automatic, objective, and easy to use in multi-institutional clinical trials. With the vast advancements in deep learning techniques, there has been tremendous success in automatic segmentation of brain tumors from MRI, including adult,23,24 pediatric brain tumors,25,26 and our previous work of segmenting DMG27,28. These advancements have the potential to enable us to create a fully automatic, image-based radiomic analysis and DMG prognostic tool. In this work, we developed a novel imaging tool to process and analyze DMG patient’s MRI data with the goal of predicting their 1-year OS. One year is the median OS of our internal cohort, and it is also close to the median OS of 11 months reported on larger DIPG studies.11 Therefore, accurate prediction of patient’s 1-year OS could have profound impact on the clinical management of DMG. The proposed tool is automatic, and it provides deep learning-based segmentation of subregions of DMG from MRI, radiomic feature extraction and selection based on the segmented volumes, and machine learning-based OS prediction. The proposed method was trained and validated on an internal cohort from Children’s National Hospital (CNH) to investigate the accuracy of OS prediction in 1) a baseline study using MRIs obtained at diagnosis, and 2) a post-RT study using MRIs obtained at both diagnosis and post-RT (i.e., after the first RT). The method was further tested on an external DMG dataset from Children’s Hospital of Philadelphia (CHOP) to assess the reproducibility of our findings. ## Materials and Methods ### Study Cohort For this two-center retrospective study, institutional review board approval was obtained at both participating institutions (CNH IRB Protocols #1339 and #14310; the Children’s Brain Tumor Network (CBTN), 29 IRB requirement waived). Our internal cohort from CNH includes 53 pediatric and adolescent patients diagnosed with DMG between 2005-2022 (F=29, M=24) at CNH. The median patient age at diagnosis is 6.5 years with a range of 3.2–25.9 years. The median OS is 12 months with a range of 3.3–132 months from diagnosis (1 patient is still alive). The external cohort from CHOP includes 16 pediatric patients diagnosed with DMG between 2005-2022 (F=9, M=7), made available by CBTN. The median age at diagnosis is 9.4 years with a range of 3.8–18.2 years. The median OS is 9.6 months with a range of 1.3–27.1 months from diagnosis. ### MRI Data Both institutions used scanners and imaging protocols that varied among patients and timepoints because of retrospective data collection. For each patient, 4 MRI sequences at diagnosis and/or post-RT were collected including T1-weighted (T1), contrast-enhanced T1 (T1ce), T2-weighted (T2), and T2-weighted-Fluid-Attenuated Inversion Recovery (T2 FLAIR). The MRIs were acquired either on 1.5T or 3T magnet, with 2D or 3D acquisition protocols, using scanners from GE Healthcare, Siemens AG, or Toshiba. T1 and T1ce MRIs included T1 SE, T1 FSE, T1 MPRAGE, or T1 SPGR. T2 MRI included T2 SE, T2 FSE, T2 FRFSE or T2 propeller. T2 FLAIR MRI included those with or without gadolinium (Gd) enhancement. The slice thickness range was 0.5–6 mm and matrix range was (256–512)ξ(256–512) pixels. All images were collected in the DICOM image format. Manual segmentation of DMG volumes was used as the ground truth for training the deep learning segmentation model. It was performed under the supervision of two expert neurooncologists using ITK-SNAP.30 Inter-expert variability was resolved through consensus. Because necrosis/cyst is not consistently identifiable for DMG, two labels were created: tumor core (TC) and whole tumor (WT). TC included two components: the Gd-enhancing tumor appearing as enhancement on T1ce, and the necrotic/cystic core appearing as hypointense on T1ce. WT includes TC and the peritumoral edematous/infiltrated tissue appearing as abnormal hyperintense signal on T2 FLAIR. ### Automatic DMG Segmentation Despite the tremendous success of deep learning-based automatic segmentation for adult GBMs, the direct application of these methods on rare pediatric brain tumors remains challenging31. While GBMs and DMGs share several clinical properties, they have distinctive characteristics as well, especially in their location in the brain and radiologic presentation. Our approach was to transfer knowledge learnt from GBM segmentation to DMG segmentation. The Brain Tumor Segmentation (BraTS) challenge is an ongoing annual event that has been held since 2012. We obtained imaging data of 1,251 GBM patients that was publicly available from BraTS.32 For each patient, 4 MRI sequences (T1, T1ce, T2, and T2 FLAIR) and manual segmentations of TC and WT subregions of GBM were provided. The winning method of the BraTS 2020 challenge was based on nnU-Net24, a popular and robust semantic deep-learning segmentation method. nnU-Net analyzes the training data and automatically configures a matching U-Net33-based segmentation pipeline. Figure 1 shows the model architecture of our transfer learning-based approach using nnU-Net. It includes a pretraining phase of nnU-Net using the GBM dataset. Because nnU-Net automatically determines the segmentation pipeline based on the specific dataset, we changed this pretraining paradigm to first design the segmentation pipeline based on the DMG dataset, and then used the planned pipeline to perform pretraining on the GBM data. The pretrained network weights were then used as initialization to finetune the model using the DMG dataset. Preprocessing was performed in an automatic fashion and included N4 bias correction to correct for MRI inhomogeneities34, rigid registration to the SRI-24 Atlas for spatial alignment35, and skull stripping36. The output of the segmentation model was the TC and WT volumes, which were used as input to the radiomic feature extraction step. ![Figure 1.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/01/03/2023.11.01.23297935/F1.medium.gif) [Figure 1.](http://medrxiv.org/content/early/2024/01/03/2023.11.01.23297935/F1) Figure 1. Model architecture of our DMG segmentation method, which employs nnUnet-based pre-training and fine-tuning. The input to pre-training is the adult brain tumor dataset from the BraTS 2021 challenge. The input to fine-tuning is the preprocessed DMG dataset. Based on the DMG dataset, a specific segmentation pipeline is determined and used for pre-training. After pre-training, the obtained weights are used as input for fine-tuning. The output of the model is multi-region segmentation masks. ### Experiments and Evaluation for Tumor Segmentation Data from 45 CNH patients (with manual segmentation) were used for training and validation of the segmentation model. Scans at diagnosis and post-RT of the same patient were counted as separate MRI sets for the purpose of segmentation, each set containing four MRI sequences. This yielded a total of 82 sets from the 45 patients. Specifically, 41/82 sets were acquired at diagnosis, 34/82 sets were acquired within 1-month post-RT, and the rest of 7 sets were acquired 2–4 months post-RT. The 82 DMG sets were randomly divided into 5 folds, and 5-fold cross-validation was performed to obtain the TC and WT volumes. Data from the same patient was always kept in the same fold. Dice coefficient and volume similarity were used as evaluation metrics to compare the predicted and ground truth segmentations, where the volume similarity is calculated as the ratio between the smaller of the compared volumes and the average of the compared volumes37. After 5-fold cross-validation, we trained a final model with all 82 sets and used it to predict TC and WT volumes for the remaining 8 internal patients and 16 external patients. Many DMG cases do not have or have very small TC volumes. Thus, comparison between predicted and ground truth in small or absent TC volumes produces extreme metrics (e.g., Dice score of 0 or 1). To void bias to small volumes, we cleaned predicted volumes by removing small (i.e., <130 mm3) disconnected regions. Moreover, let TC/WT denote the ratio between TC volume and WT volume. We did not evaluate segmentation performance if 0