Prediction of Pituitary Adenoma’s Volumetric Response to Gamma Knife Radiosurgery Using Machine Learning-Supported MRI Radiomics
Herwin Speckter, Marko Radulovic, Erwin Lazo, Giancarlo Hernandez, Jose Bido, Diones Rivera, Luis Suazo, Santiago Valenzuela, Peter Stoeter, Velicko Vranes

TL;DR
This study uses MRI radiomics and machine learning to predict how pituitary adenomas will respond in size to gamma knife radiosurgery, offering a more accurate approach than traditional methods.
Contribution
The study pioneers the use of radiomic MRI analysis to predict volumetric response of pituitary adenomas to gamma knife radiosurgery.
Findings
Radiomic models achieved AUC values up to 0.928 in predicting tumor volume response.
Radiomic models outperformed benchmark models using only clinicopathological parameters.
Multi-modality models combining MRI and clinicopathological data showed strong predictive performance.
Abstract
Background/Objectives: Gamma knife radiosurgery (GKRS) is widely performed as an adjuvant management of patients with residual or recurrent pituitary adenoma (PA). However, the variability in the tumor volume response to GKRS emphasizes the need for reliable predictors of treatment outcomes. The application of radiomics, an analytical approach for quantitative imaging, remains unexplored in predicting treatment responses for PAs. This study aimed to pioneer the use of radiomic MRI analysis to predict the volumetric response of PA to GKRS. Methods: This retrospective observational cohort study involved 81 patients who underwent GKRS for PA. Pre-treatment 3-Tesla MRI scans were used to extract radiomic features capturing the intensity, shape, and texture of the tumors. Radiomic signatures were generated using the least absolute shrinkage and selection operator (LASSO) for feature…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2- —University Instituto Tecnologico de Santo Domingo (INTEC), Dominican Republic
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · Pituitary Gland Disorders and Treatments · Glioma Diagnosis and Treatment
1. Introduction
Pituitary adenomas (PAs) are predominantly benign brain tumors, constituting 10–20% of all intracranial neoplasms, and account for most sellar and parasellar tumors. Gamma knife radiosurgery (GKRS) has emerged as an effective and minimally invasive treatment modality with high therapeutic efficacy for various intracranial pathologies, including nonfunctioning and functioning PAs. Radiosurgery (SRS), involving hypofractionated SRS delivered in up to five fractions, achieves tumor control rates ranging from 80% to over 90% in published reports [1,2,3]. Although GKRS achieves favorable outcomes, the individual treatment response varies substantially among patients, necessitating more reliable predictors of individual treatment outcomes [4,5,6].
A common approach in PA treatment is the trans-sphenoidal removal of the central part of the PA, frequently leaving the remnants within the cavernous sinus for GKRS treatment. The volume response, commonly assessed through post-treatment imaging, is an essential indicator of GKRS’s efficacy and can inform subsequent management decisions. Identifying reliable predictors of volume response would enable early intervention for poor and non-responders, potentially improving overall patient outcomes. If a reliable pre-treatment outcome prediction existed, surgical treatment could be avoided in selected PAs predicted to respond significantly to GKRS. Conversely, in cases with a less favorable prognosis for GKRS, where higher radiation doses may be required, upfront surgical removal of the central parts of the PA may be preferable to avoid radiation injury to the adjacent critical structures, particularly the optic nerves, chiasm, and tracts.
Radiomics focuses on extracting and analyzing the intensity, morphological, and texture features of medical images. These features capture complex tumor characteristics that are often invisible to the human eye, enabling the potential identification of imaging biomarkers for treatment response prediction. Radiomics has shown promise in various oncological contexts, allowing for non-invasive prediction of treatment response or prognosis of disease outcome [7,8].
We hypothesized that radiomics prediction signatures could outperform the benchmark signature relying only on routine clinicopathological (CP) features for predicting PA’s volumetric response.
The objective of this study was to develop several single-modality and multi-modality radiomic prediction signatures, both with and without CP parameters, to predict the volumetric response of PAs to GKRS, using pre-treatment MRI. Additionally, the study aimed to compare the predictive performance of these signatures in the test folds. Improving the prediction of treatment outcomes is highly clinically relevant because it enables personalized treatment planning, optimizes therapeutic strategies, and thereby enhances patient care.
2. Methods
This study adheres to the guidelines set out in the STROBE statement for cohort studies.
2.1. Ethics Approval Statement
The study received approval (CEI-391, 24 April 2019) from our Institutional Review Board and adheres to The Code of Ethics of the World Medical Association (Declaration of Helsinki), as published in the British Medical Journal (18 July 1964) and its 7th revised edition in 2013. The need for written informed consent was waived by the ethics committee due to the retrospective nature of the analysis.
2.2. Patients
Included were 81 patients between 11.4 and 83.2 years of age (mean: 45.6) with imaging-diagnosed PA treated at our gamma knife center. Of the 81 PAs, 52 were nonfunctional PAs, while 29 were functional PAs, including 5 growth hormone (GH)-secreting adenomas, 7 adrenocorticotropic hormone (ACTH)-secreting adenomas, and 17 prolactin hormone (PRL)-secreting adenomas. 15 PAs were treated upfront, 45 PAs had one surgery before SRS, 16 PAs had two surgeries, and 5 PAs were operated on 3 times before SRS. MRI had been performed within four weeks before radiosurgery, with available follow-up data after an interval of six months or longer (range: 6.7–105.5 months, mean: 40.4; Table 1). Tumor volumes ranged between 0.18 and 40.33 cm^3^ (mean: 6.30).
2.3. Sample Size Calculation
The prospective sample size calculation was based on a pilot experiment with 30 patients and required 36 patients for alpha = 0.05, beta = 0.20, and AUC = 0.76 (Medcalc 14.8.1; MedCalc Software Ltd., Ostend, Belgium). These 30 patients were included in the final cohort. The actual AUCs obtained for the calculated scores integrating radiomics and CP features ranged between 0.759 and 0.928, with a final sample size of 81 patients in T1w, 81 in CE-T1w, 48 in T2w, and 41 in FLAIR.
2.4. Gamma Knife Treatment
The gamma knife technique was previously described [7]. On the day of the treatment, after placing a stereotactic G frame (Elekta AB), under sedation and local anesthesia, contrast-enhanced 3D computed tomography (CT) imaging was obtained. Pre-treatment MRI sequences, acquired less than four weeks before, were then coregistered to the stereotactic CT. Treatments were planned on a Leksell GammaPlan 10.1 workstation (Elekta Instrument AB, Stockholm, Sweden), carefully respecting the dose constraints, particularly of the optic apparatus.
The margin dose varied from 12 to 40 Gy (mean: 20.5 Gy), depending on the tumor’s size, location, and hormone status (Table 1). Forty-nine PAs were treated in a single session, with margin doses ranging from 12 to 35 Gy (mean: 17.8 Gy). According to our institutional protocol, lesions abutting the anterior optic pathway are treated using hypofractionated radiosurgery (HFSRS). Thirty-two PAs were treated with 5.0 to 9.0 Gy (mean: 6.1 Gy) for 3 to 5 days (mean: 4.03). The biologically effective dose (BED) is routinely used to compare doses of different dose–fraction regimens relying on the widely accepted linear quadratic model, with its known limitations for high doses [9]. HFSRS doses can be converted to single-fraction equivalent doses (SFED) to intuitively compare radiation effects with conventional physical doses [10]. Margin SFED varied from 11.1 to 35.0 Gy (mean: 16.2 Gy), applying an α/β ratio of 2.47 Gy for nonfunctional PAs or 4.91 Gy for functioning Pas [11,12].
2.5. MRI
MRI was performed on a 3-Tesla scanner (Achieva; Philips, Eindhoven, Netherlands). Three-dimensional T1-weighted non-contrast (T1w), contrast-enhanced (CE-T1w), T2w, and FLAIR sequences were acquired with the following sequence parameters.
Three-dimensional T1w magnetization-prepared rapid acquisition (MPRAGE) sequence: gradient echo; TR/TE/TI, 6.8/3.2/900 ms; flip angle, 8°; measured voxel size, 0.6 × 0.6 × 1.0 mm, before and after intravenous injection of contrast medium.T2w sequence: TR/TE 3693.8/80 ms; 150 transversal slices; thickness, 1 mm; matrix, 512 × 512.Fluid-attenuated inversion recovery (FLAIR) sequence: TR/TE/TI, 11,000/120/2800 ms; 90 transversal slices; thickness, 2 mm; matrix, 512 × 512.
2.6. Postprocessing
In all 81 patients, PA volumes were delineated and measured from CE-T1w images on the Leksell GammaPlan workstation. MRI sequences were coregistered to the stereotactic 3D contrast-enhanced CT acquired on the treatment day. Image sets were verified for high image quality. Sequences with artifacts were excluded.
2.7. Follow-Up
Imaging and clinical follow-up were performed every six months for the first two years after GKRS and annually thereafter.
2.8. Feature Extraction
The radiomics analysis was conducted using the open-source Pyradiomics plugin, integrated into 3D Slicer (version 5.6.1) [13]. The Pyradiomics parameter file informed the computation of all available image transformations and feature types, resulting in a total of 2156 features per MRI scan.
Feature extraction was performed on the original images (107 features) and on images transformed by wavelet, square, square root, logarithm, gradient, exponential, Laplacian of Gaussian (LoG), and Local Binary Pattern (LBP 2D) filters. For detailed descriptions of the extracted radiomics features, please see: https://pyradiomics.readthedocs.io/en/latest/features.html (accessed on 19 April 2025).
2.9. Model Selection
Features were pre-selected by discarding those without a significant correlation with the PA’s volume change outcome, based on Pearson’s correlation test (Statistica 12.0, StatSoft, Hamburg, Germany). Predictive models were constructed using the features selected by LASSO regression with integrated 10-fold cross-validation and the following classifiers: random forest, naïve Bayes, kNN, logistic regression, neural network, and SVM (Orange Data Mining, University of Ljubljana, Slovenia).
2.10. Evaluation of Predictive Performance
The PA’s volume change per natural logarithm of time was chosen as the endpoint to account for the near-exponential decrease in tumor volume over time [14]. Beyond an initial 6-month phase, our data indicate that most PA volumes follow an exponential trend. To compare volume changes during long FU periods (FUP), tumor volume change per natural logarithm of time is more accurate than volume change per time. This was calculated as the difference in tumor volume before SRS and at the last follow-up, divided by the initial volume and the natural logarithm of the time since SRS to the last FU [14]:
Predictive performance was assessed using ROC analysis (IBM SPSS v28, Armonk, NY, USA; Orange Data Mining), with statistical analyses considering both continuous independent variables and continuous or categorized dependent outcomes.
2.11. Validation
Validation involved bootstrapping for ROC analysis and split-sample cross-validation via LASSO. Besides LASSO cross-validation in the training folds (51 patients), models were validated on eight random test folds (30 patients).
3. Results
3.1. Patients’ Characteristics
The treatment results are presented in Table 1. After a mean FUP of 40.4 months, the control rate was 98.8%. As there are no standardized radiographic criteria for assessing PAs’ treatment response, we used the RANO criteria [15] stated by Imber et al. for PA response characterization [16]. Partial response was achieved in 60 (74.1%) PAs, with stable disease found in 20 (24.7%) patients, while 1 (1.2%) progressive PA was observed at last follow-up. The volumetric outcome was notably better in functioning PAs, with a volume reduction of 2.21% per month, compared with a lower volume reduction per month of 1.22% for nonfunctional PAs. This difference in volumetric response is probably attributed to the significantly higher doses used in treating functioning PAs, which are required to obtain hormone remission.
3.2. Experimental Design
The workflow of the study is shown in Figure 1. Models were developed for individual modalities (CP, T1w, CE-T1w, T2w, FLAIR) and combined modalities (T1w + CE-T1w, CP + T1w + CE-T1w), focusing on the T1w and CE-T1w MRI sequences available for all 81 patients. This facilitated multi-sequence model construction and validation across eight test folds. Tumor volume response prediction used R^2^ from linear LASSO (Table 2) for continuous outcomes, and logit LASSO with six classifiers for categorized outcomes, using AUC and accuracy for evaluation (Table 3 and Table 4).
3.3. The Predictive Models for PA’s Response to Radiosurgery
The constructed predictive models were initially tested against the continuous, uncategorized tumor volume response outcomes (Table 2). Thereby, T2w showed the best association with the outcomes (R^2^ = 0.665), followed by the most comprehensive models: CP + T1w + CE-T1w (R^2^ = 0.584) and T1w + CE-T1w (R^2^ = 0.502). The optimal performance of T2w was notable, but because of its availability in a smaller number of patients, this result was interpreted as preliminary. Our focus was thus directed towards CP + T1w + CE-T1w, identified as the second-best performer (Table 2).
The optimal cutpoints for the comprehensive model, combining CP and radiomics features (CP + T1w + CE-T1w), were identified by testing against six different cutpoints in the outcome (Figure 2). The cutpoint of a −0.25% tumor volumetric response per natural logarithm of time was identified as the most suitable for prognosis using the available CP and radiomics features (Figure 2).
After determining the optimal cutpoint for outcome categorization, the predictive performance of individual CP parameters was evaluated using the t-test for continuous and the Chi-square test for binary values (Table 3). Age, fraction number, accumulated dose, SFED, and hormone secretion status showed significant associations with the −0.25% outcome (Table 3). Models using only CP or CE-T1w features had the weakest link with the −0.25% outcome, while those with T1w features performed best. Notably, there was no significant difference in predictive performance among the T1w, T1w + CE-T1w, and CP + T1w + CE-T1w models, as confirmed by an independent sample t-test. To reduce the feature burden on LASSO, features were pre-selected on the basis of their Pearson correlation with the outcome cutpoint at −0.25%, enhancing the models’ prognostic performance. These features were refined through L1 LASSO selection, with the top eight used in classification by random forest, naïve Bayes, kNN, logistic regression, neural networks, and SVM classifiers (Table 4). Comparing the classifiers’ performance using the CP + T1w + CE-T1w model for the −0.25% outcome cutpoint revealed that logistic regression, the neural network, and SVM showed similar classification performance in the test folds, with random forest, naïve Bayes, and kNN performing less effectively (Table 4).
Table 5 details selected features and coefficients for the model incorporating CP, T1w, and CE-T1w features, highlighting the inclusion of only one CE-T1w feature, likely due to its inferior predictive performance (Table 5).
4. Discussion
This initial study demonstrates that radiomic signatures built on pre-GKRS MRI data can be used to predict tumors’ volumetric response to radiosurgery. The computed radiomic models outperformed the benchmark model, which included only clinicopathological features. The T1w and T2w models were the best predictive performers, while the dual sequence T1w + CE-T1w and the merged CP + T1w + T2w models did not provide a statistically significant improvement over the single-modality models, as judged by the t-test. All models, except for T2w and FLAIR (due to their low numbers), underwent additional evaluation for generalizability using cross-validation. The results demonstrated that the predictive performance in the training folds was largely retained in unseen test folds, with the remaining AUCs consistently exceeding 0.90.
Radiomics analysis treats MRI as minable data by extracting quantitative computational features. It gains importance as clinical imaging becomes increasingly widespread. To the best of our knowledge, no prior studies have employed texture or radiomics analyses on pituitary adenomas. Our group and others have performed texture analyses on meningiomas [17], vestibular schwannomas [18,19,20], and brain metastases [21,22]. In a previous study investigating the use of radiomics features to develop a treatment prognosis following SRS, our group applied radiomics to pre-radiosurgical MRI to predict the long-term outcomes of WHO Grade 1 meningiomas after SRS, achieving satisfactory predictive performance, as evidenced by an AUC reaching 0.88 [7]. Other groups used radiomics to predict the outcome and pseudoprogression of vestibular schwannoma treated with SRS [23,24]. Several studies explored radiomics to predict local control of brain metastases [25,26,27,28] or arteriovenous malformations [29,30,31] after SRS using radiomics.
Of the four sequences investigated, we found that T2w provided the best predictive performance for volumetric PA change (Table 2). This result agrees with our previous findings applying texture analysis to predict the volumetric outcomes of SRS in benign meningiomas and vestibular schwannomas. For benign meningiomas, we found that the histogram parameter standard deviation of voxel intensities of T2w images correlated best with volumetric change after SRS [17]. Increased T2w intensity values have been reported to relate to a soft consistency of meningiomas, increased vascularity, cellular atypia, and angioblastic or melanocytic components, as well as cystic degeneration and ischemic necrosis [32,33,34]. For vestibular schwannomas, the kurtosis of T2w image intensity values predicted progression best, with a sensitivity and specificity of 71% and 78%, while the minimum of the T2w voxel intensity values correlated significantly with the final regression of tumor volume per month [18].
Several studies investigated the use of MRI to preoperatively assess tumor consistency in PAs, as tumor consistency is a critical factor in surgical planning. Most existing studies support the ability of T2w to predict intraoperative consistency, while T1w has not been shown to offer any predictive value [35]. Hypointensity on T2w likely correlates with firmer tumors, possibly attributable to their increased collagen content and vascularity. In comparison, softer tumors tend to be hyperintense on T2w, which may relate to higher water content and/or cystic components [36,37,38]. Although multiple preoperative PA consistency assessment methods have been studied, none demonstrated sufficient accuracy and reliability in clinical use [39]. More recently, radiomics and machine learning-based models have achieved high precision and good AUC values [39]. Wan et al. developed a radiomics model built on combined T1w/CE-T1w/T2w images, with 11 imaging features exhibiting statistically significant differentiation between soft and hard PAs, providing an excellent performance with an AUC of 0.90 [40].
In functional PAs, in addition to volumetric tumor control, hormone remission is a mandatory treatment objective. As more genes must be silenced to achieve hormone remission, substantially higher PA margin doses are required. Because of higher margin doses, we observed a more favorable volumetric response in functional PAs (mean volume change/month: −2.21%) compared with nonfunctioning PAs (mean volume change/month: −1.22%). A preliminary analysis revealed enhanced associations within subdivided PA cohorts, achieving an AUC close to 1. Due to the small sample size of functional PA treatments, we were unable to separately perform a reliable statistical analysis in the subset of functional PA tumors. However, our findings warrant further investigation.
The high dimensionality of radiomics analysis is both its strength and its weakness, as its high dimensionality has been widely criticized. To address this issue, we pre-selected features by excluding those that were not significantly associated with the outcome, resulting in an extensive reduction in dimensionality. Additionally, our study showed that adding CP features to multi-sequence imaging data (CP + T1w + CE-T1w) did not improve predictive performance over single-sequence radiomics models.
5. Limitations
Although the patient group was highly homogeneous and vastly exceeded the sample size requirement, its size still posed a limitation. To enhance the clinical validity of the reported feature association with PAs’ volume response to GKRS, further studies in larger patient groups are warranted. The relatively short follow-up time and the retrospective design of the prognostic models were additional limitations. Moreover, predictive studies necessitate confirming the generalizability of the acquired findings through external validation, while internal prognostic validation in unseen test folds that were not part of the development cohort has already been carried out within this study. Therefore, further validation in external cohorts is needed to establish the prognostic clinical validity of the obtained predictive models. Additionally, despite the objective nature of the computational analysis technique, the workflow included residual subjectivity during the tumors’ VOI segmentation.
6. Conclusions
We demonstrated that a radiomics-based model using conventional MR imaging achieved excellent predictive classification and generalization performance, surpassing a model based on clinicopathological parameters. By achieving reliable predictions of the volumetric outcomes of SRS, radiomics might enable individualized treatment strategies, ultimately contributing to improved treatment outcomes for patients undergoing SRS for pituitary adenomas.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Trifiletti D.M. Dutta S.W. Lee C.C. Sheehan J.P. Pituitary Tumor Radiosurgery Prog. Neurol. Surg.20193414915810.1159/00049305931096230 · doi ↗ · pubmed ↗
- 2Kotecha R. Sahgal A. Rubens M. De Salles A. Fariselli L. Pollock B.E. Levivier M. Ma L. Paddick I. Regis J. Stereotactic radiosurgery for non-functioning pituitary adenomas: Meta-analysis and International Stereotactic Radiosurgery Society practice opinion Neuro Oncol.20202231833210.1093/neuonc/noz 22531790121 PMC 7058447 · doi ↗ · pubmed ↗
- 3Lehrer E.J. Kowalchuk R.O. Trifiletti D.M. Sheehan J.P. The Role of Stereotactic Radiosurgery for Functioning and Nonfunctioning Pituitary Adenomas Neurol. India.202371(Suppl. S 1)S 133S 13910.4103/0028-3886.37363137026344 · doi ↗ · pubmed ↗
- 4Dayawansa S. Abbas S.O. Mantziaris G. Dumot C. Donahue J.H. Sheehan J.P. Volumetric Assessment of Nonfunctional Pituitary Adenoma Treated With Stereotactic Radiosurgery: An Assessment of Long-Term Response Neurosurgery 2023931339134510.1227/neu.000000000000259437437306 · doi ↗ · pubmed ↗
- 5Pomeraniec I.J. Xu Z. Lee C.C. Yang H.C. Chytka T. Liscak R. Martinez-Alvarez R. Martinez-Moreno N. Attuati L. Picozzi P. Dose to neuroanatomical structures surrounding pituitary adenomas and the effect of stereotactic radiosurgery on neuroendocrine function: An international multicenter study J. Neurosurg.202113681382110.3171/2021.3.JNS 20381234560630 · doi ↗ · pubmed ↗
- 6Mantziaris G. Pikis S. Chytka T. Liščák R. Sheehan K. Sheehan D. Peker S. Samanci Y. Bindal S.K. Niranjan A. Adjuvant versus on-progression Gamma Knife radiosurgery for residual nonfunctioning pituitary adenomas: A matched-cohort analysis J. Neurosurg.20221381662166810.3171/2022.10.JNS 22187336401547 · doi ↗ · pubmed ↗
- 7Speckter H. Radulovic M. Trivodaliev K. Vranes V. Joaquin J. Hernandez W. Mota A. Bido J. Hernandez G. Rivera D. MRI radiomics in the prediction of the volumetric response in meningiomas after gamma knife radiosurgery J. Neurooncol.202215928129110.1007/s 11060-022-04063-y 35715668 · doi ↗ · pubmed ↗
- 8DjuričićG.J. Ahammer H. RajkovićS. KovačJ.D. MiloševićZ. Sopta J.P. Radulovic M. Directionally Sensitive Fractal Radiomics Compatible with Irregularly Shaped Magnetic Resonance Tumor Regions of Interest: Association with Osteosarcoma Chemoresistance J. Magn. Reason. Imaging 20235724825810.1002/jmri.2823235561019 · doi ↗ · pubmed ↗
