Paraspinal Muscle Fat Infiltration as a Key Predictor of Symptomatic Intravertebral Vacuum Cleft: A Machine Learning Approach
Joonghyun Ahn, Jaewan Soh, Young-Hoon Kim, Jae Chul Lee, Jun-Seok Lee, Hyung-Youl Park, Jeong-Han Lee, June Lee, Youjin Shin

TL;DR
This study uses machine learning to show that fat infiltration in paraspinal muscles can predict a spinal condition called symptomatic intravertebral vacuum cleft.
Contribution
The study introduces muscle-related variables, particularly paraspinal fat infiltration, as key predictors in machine learning models for predicting SIVC.
Findings
Random Forest achieved 96.6% accuracy in predicting SIVC when muscle-related variables were included.
Multifidus and erector spinae fatty infiltration were top predictors of SIVC.
Adding muscle-related variables significantly improved model performance across all ML algorithms.
Abstract
Background/Objectives: Symptomatic intravertebral vacuum cleft (SIVC) is a complication of vertebral compression fractures (VCFs) that leads to persistent pain and deformity. Its prediction remains challenging due to multifactorial causes. Paraspinal muscle fat infiltration has been associated with spinal fracture outcomes but has not been extensively explored in SIVC prediction. Our aim was to develop machine learning (ML) models for predicting SIVC and to evaluate the role of muscle-related variables in improving predictive performance. Methods: Demographic, radiological, and muscle-related variables were collected. ML models—including Logistic Regression, Random Forest, XGBoost, and Multi-Layer Perceptron—were trained and tested under two input conditions: baseline variables (SETTING_1) and baseline plus muscle-related variables (SETTING_2). Model performance was evaluated using…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9- —Institute of Clinical Medicine Research of Bucheon St. Mary’s Hospital, Research Fund, 2024
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHip and Femur Fractures · Hip disorders and treatments · Pelvic and Acetabular Injuries
1. Introduction
The vertebral compression fracture (VCF) is a common spinal condition that has been increasing in prevalence in recent decades. If unresolved through conservative treatment due to various factors, it can lead to persistent back pain, kyphotic deformity, and neurological deterioration [1,2,3,4,5]. A hallmark of this condition is the intravertebral vacuum cleft (IVC), a distinctive radiological feature associated with the non-union of the VCF, some of which requires surgical intervention [6,7]. “Symptomatic” IVC (SIVC) represents a distinct clinical entity where patients with VCFs develop recurrent pain after initial improvement, accompanied by characteristic radiographic findings of IVC.
Unlike typical VCF cases that show spontaneous improvement, SIVC patients experience persistent or recurring pain (VAS > 3) that correlates with specific radiological features. The early detection and accurate prediction of SIVC are clinically significant, yet these remain challenging due to the multifactorial nature of its pathogenesis [8,9]. SIVC has been reported to occur slightly more frequently in females, with other risk factors including renal disease, diabetes, chronic steroid use, osteoporosis, alcoholism, hypothyroidism, and radiation therapy [6,8,10,11,12,13,14]. Recent studies have also reported that fat infiltration of the paraspinal muscles may increase the risk of VCFs or influence clinical and radiological outcomes following spinal fractures [15,16,17]. Specifically, fatty infiltration in the multifidus (MF) and erector spinae (ES) muscles has been shown to reduce spinal support, increasing the risk of osteoporotic vertebral compression fractures and potentially leading to higher rates of non-union following fusion surgery [17]. The quantitative assessment of muscle status could serve as an important biomarker for early diagnosis and treatment planning for SIVC, which have direct clinical implications for patient management and outcome improvement.
In recent decades, the rapid development of machine learning (ML) algorithms has significantly impacted orthopedic surgery. ML enables the high-accuracy analysis and prediction of complex patterns in medical imaging, electronic medical records (EMRs), and other clinical datasets [18]. Previous studies have demonstrated the utility of ML in predicting osteoporotic VCFs using magnetic resonance imaging (MRI) and demographic data alongside bone mineral density scores [19,20]. However, to our knowledge, no previous studies have employed ML approaches incorporating muscle-related variables to predict VCFs and SIVC.
Therefore, the primary objective of this study was to determine whether muscle-related variables were significant risk factors in the prediction of SIVC using ML by comparing demographics and variables measured from plain radiography with and without muscle-related variables. The secondary objective was to investigate which ML algorithms could best predict SIVC and to what extent muscle-related variables had significant effects.
2. Materials and Methods
2.1. Study Design and Patient Population
This study was a retrospective analysis of patients diagnosed with VCFs who were enrolled between March 2013 and February 2023 (Figure 1). This study was approved by the Institutional Review Board (IRB) of the institution. The inclusion criteria were as follows: patients diagnosed with a VCF at the thoracolumbar junction (TLJ and T10-L2) and patients diagnosed with SIVC at the TLJ. The exclusion criteria were patients with a pathological VCF, such as tumors or infection, patients with multiple-location VCFs, patients without MRI, and patients without adequate plain thoracolumbar anteroposterior and lateral radiographs.
2.2. Definition and Diagnostic Criteria of SIVC
SIVC was defined by the following criteria:
Clinical Criteria:
-
-Initial acute pain following a VCF.
-
-Period of initial improvement with conservative treatment.
-
-Development of recurrent pain (VAS > 3).
-
-Location-specific pain corresponding to the level of vacuum cleft. Radiographic Criteria:
-
-Radiographic evidence of intravertebral vacuum cleft (IVC) on plain thoracolumbar radiographs.
-
-Absence of other significant pathology at the affected and adjacent levels, such as new fractures.
To address potential diagnostic subjectivity, two independent spine specialists (YHK and JA) confirmed the diagnosis of SIVC.
2.3. Data Collection
A clinical data warehouse (CDW) containing all the medical records from the institution was used to select and study patients. Demographic and radiological data were collected from all enrolled patients. The demographic data included sex, age, diabetes, hypertension, adrenal insufficiency, hyperthyroidism, hypothyroidism, and steroid use.
Radiological characteristics were measured using the following variables: the “angle” (local kyphotic angle) and compression ratio of the VCF were measured from the lateral view of the plane radiograph (PR) (Figure 2).
From the axial view of MRI, using ImageJ software (version 1.8.0, National Institutes of Health, Bethesda, MD (Maryland), USA), the cross-sectional areas (CSAs) of the VCF upper endplate, multifidus (MF), and erector spinae (ES) muscles were measured. Additionally, the percentage of fatty infiltration in MF (MFfi) and ES (ESfi) was quantified. For measuring fat infiltration, T2-weighted axial MRI images at the level of the vertebral fracture upper endplate were analyzed.
Regions of interest (ROIs) were manually drawn to outline the boundaries of the multifidus (MF) and erector spinae (ES) muscles bilaterally. The software’s threshold function was then applied to differentiate muscle tissue from fat based on signal intensity. The percentage of fatty infiltration was calculated as follows: (area of fat/total muscle area) × 100 (Figure 3). The relative multifidus (rMF) was defined as the MF/CSA of the endplate, and the relative erector spinae (rES) was defined as the ES/CSA of the endplate. All measurements were independently conducted by two trained observers (JA and YHK) who were blinded to the patient’s clinical information.
To evaluate inter-observer reliability, intraclass correlation coefficients (ICCs) were calculated for both MFfi and ESfi measurements on a random subset of 50 patients. The ICC for MFfi was 0.92 (95% CI: 0.89–0.95) and for ESfi was 0.89 (95% CI: 0.85–0.93), indicating excellent inter-observer reliability. For cases with measurement discrepancies exceeding 10%, a consensus was reached through joint reassessment. The final values used in the analysis were the average of the two observers’ measurements.
To address the class imbalance in our training dataset (654 VCF patients without IVC vs. 40 patients with SIVC), we applied the Synthetic Minority Over-Sampling Technique (SMOTE). Specifically, the original training set contained 28 SIVC cases (approximately 5.8%), which was increased to 457 cases after the application of SMOTE to achieve balance with the 458 VCF patients. This allowed the models to better capture patterns associated with the minority class while maintaining generalizability.
2.4. Machine Learning Models
To confirm the known risk factors for VCFs and SIVC, we conducted experiments using variables extracted from PR and EMR for SETTING_1 and variables extracted from PR, EMR, and MRI for SETTING_2. The patients were randomly divided into training (70%) and test (30%) datasets. A training set was used to develop the model. Supplementary 5-fold cross-validation was used to optimize the hyperparameters of the machine learning model in the training set, and the data of the training set were augmented using the SMOTE package to effectively learn from fewer datasets. The test set was used to evaluate the performance of the learned models: Logistic Regression (LR) [21], Random Forest (RF) [22], extreme gradient boosting (XGBoost) [23], and Multi-Layer Perceptron (MLP) [24]. Training and testing processes were conducted in Python 3.9.12 and PyCharm environments.
2.5. Evaluation Matrix
For objective performance evaluation, the areas under the accuracy, specificity, sensitivity, precision, F1-score, and area under the receiver operating characteristics curve (AUROC) of the models were measured and averaged ten times for each method in the test set.
3. Results
3.1. Patient Characteristics
Of the 1694 patients, data from 694 patients who met the inclusion criteria and did not violate the exclusion criteria were included in this study; of these, 485 were used for training the ML models and 209 were used for testing. The analyses were summarized between SIVC and non-SIVC groups across multiple variables in Table 1. Notably, muscle-related variables (MFfi, ESfi, rMF, rES) and radiological measurements (angle, compression value, disc CSA) showed significant differences with p < 0.05.
3.2. Performances of Machine Learning Models
The LR, RF, XGBoost, and MLP models were evaluated on both the training and the test sets under SETTING_1 and SETTING_2.
In the training set of SETTING_1, the AUROC values were 0.911, 0.913, 0.853, and 0.863, respectively, with corresponding accuracies of 0.923, 0.940, 0.935, and 0.891 (Table 2). In the test set of SETTING_1, the AUROC values dropped to 0.699, 0.698, 0.643, and 0.708, and the accuracies were 0.757, 0.871, 0.856, and 0.760, respectively (Figure 4, Table 3).
In SETTING_2, all models demonstrated superior performance compared to SETTING_1. In the training set, the AUROCs were 0.963, 0.973, 0.967, and 0.923, and the accuracies were 0.973, 0.990, 0.993, and 0.972, respectively (Table 2). In the test set, the AUROCs were 0.947, 0.956, 0.951, and 0.904, and the accuracies were 0.951, 0.966, 0.962, and 0.961, respectively (Figure 5, Table 3).
All ML models showed strong generalizability from training to test datasets in SETTING_2. Among them, the RF model achieved the highest accuracy and AUROC on the training set, while the MLP model maintained the best test set AUROC performance, demonstrating its robustness across the settings. To evaluate model stability, we analyzed performance metrics across each fold of the 5-fold cross-validation process. Table 4 shows the AUC and accuracy of the Random Forest model in SETTING_2 (including muscle variables) for each fold.
3.3. Feature Importance
Figure 6 illustrates the feature importance ranking of the RF model in SETTING_1, where ‘Angle’ (local kyphotic angle), age, and ‘Solondo’ (steroid use) were identified as the top three predictors. In contrast, the RF model in SETTING_2, which outperformed all other models (Figure 7), highlighted MFfi, ESfi, and ‘Disc’ (endplate CSA) as the most influential predictors for SIVC, with feature importance values of 0.871, 0.702, and 0.575, respectively. Notably, feature importance values range from 0 to 1, indicating their relative contribution to the model’s predictive performance.
3.4. LIME Analysis
To enhance the interpretability of our model’s predictions, we applied the LIME algorithm to visualize the local decision boundaries of the Random Forest classifier for a representative SIVC case. Beyond identifying global feature importance, we conducted LIME analysis to gain insights into how specific combinations of features contribute to individual predictions and to examine the potential interrelationships between them in a localized context. As shown in Figure 8, the model relied most heavily on paraspinal muscle fat infiltration indicators, particularly MF% (>33.89%) and ES% (>25.96%), to predict SIVC. Additionally, other features such as low values of rMF and rES also positively contributed to the SIVC prediction. Conversely, the absence of hyperthyroidism and the presence of hypertension were among the features that marginally suppressed the prediction toward the SIVC class. These findings further support the global feature importance trends and suggest that muscle quality and vertebral biomechanical changes play a critical role in model prediction (Figure 8).
4. Discussion
In this study, we successfully predicted VCFs and SIVC using muscle-related variables using ML models. The superior performance of SETTING_2 (incorporating muscle variables) over SETTING_1 (conventional variables only) suggests that muscle-related factors play a crucial role in SIVC development. The high feature importance of MFfi and ESfi (0.871 and 0.702, respectively) indicates that paravertebral muscle status may be more predictive of SIVC than traditional risk factors such as age and steroid use. This will also provide insights into the accurate diagnosis of SIVC in the future. To the best of our knowledge, this is the first study to explore SIVC using machine learning (ML) in conjunction with muscle-related variables. To date, SIVC has primarily been identified through radiological assessments. However, challenges in accurately distinguishing SIVC from VCFs can result in inadequate treatment. Given that SIVC requires distinct treatments, such as bone cement augmentation, and has unique clinical features and prognoses compared with VCFs, precise identification is essential [25].
The exploratory analysis in Table 1 revealed significant differences between SIVC and non-SIVC groups across multiple muscle-related variables (MFfi, ESfi, rMF, rES), and radiological measurements (angle, compression value, disc CSA) showed significant differences with p < 0.05. Based on this initial analysis, we could formulate the following hypotheses: (1) The degree of fatty infiltration in multifidus and erector spinae muscles (MFfi and ESfi) likely represents strong predictive factors for SIVC occurrence (Figure 9). (2) Changes in the vertebral endplate cross-sectional area (disc CSA) are associated with SIVC development. (3) While age appears to be a significant predictor, muscle-related variables may have more direct associations with SIVC.
In line with our hypothesis, both MFfi and ESfi were ranked among the top three predictors. Previous studies have reported that MF and ES play crucial roles in maintaining spinal balance [26,27]. This suggests that fat infiltration weakens the supportive capacity of the spine, potentially contributing to the development of SIVC. Consistent with this, previous studies have reported that fat infiltration in the ES and MF is associated with reduced spinal bone mineral density (BMD) [28,29], while fat infiltration in the MF has been linked to an increased risk of osteoporotic VCFs [15]. In addition, it is noteworthy that high fat infiltration is associated with a low union rate following lumbar interbody fusion procedures [30,31]. This suggests that fat infiltration in the posterior muscles significantly affects the stability of the spinal column. In this study, the role of fat infiltration in the paravertebral muscles in the occurrence of SIVC was confirmed using feature importance analysis.
The enlargement of the endplate cross-sectional area (CSA), potentially resulting from the formation of intravertebral clefts, reflects biomechanical changes within the vertebral structure. These changes may represent a compensatory response to redistribute the mechanical load or indicate progressive instability associated with advanced spinal degeneration [32]. This finding emphasizes the complex nature of endplate pathology and its important role in SIVC development. Furthermore, structural changes in the endplate CSA could serve as valuable biomarkers for identifying patients at risk of SIVC [33]. However, further research is required to establish whether this predictor has a causal role or simply correlates with the condition and to determine how it can be integrated into diagnostic or therapeutic approaches.
The use of steroids has been identified as a contributing risk factor for the development of SIVC [6]. In this study, SETTING_2 demonstrated lower feature importance values for steroid use compared to other factors, while SETTING_1 highlighted its significance as one of the top three predictors. However, high-quality studies specifically investigating the relationship between steroid use and SIVC are limited. Notably, the long-term improper use of steroids has been reported to increase the risk of new vertebral fractures, suggesting that inappropriate steroid use compromises vertebral stability [34,35].
Diabetes has been reported as a risk factor for SIVC and has been reported to be strongly associated with the frailty index, which implies a significant impact on overall health and resilience [12]. Several mechanisms have been demonstrated to explain this relationship. Diabetes often contributes to muscle loss (or sarcopenia) and weakness, both essential components of frailty [36]. Nutritional challenges associated with diabetes, including altered dietary patterns and nutrient absorption, can further intensify frailty [37]. Additionally, diabetes is commonly accompanied by comorbidities such as cardiovascular disease, obesity, and kidney disease, all of which significantly affect the frailty burden [38]. While some studies suggest that diabetes does not always have a direct association with SIVC, its long-term complications and systemic effects contribute to its potential role in diminishing physical and structural resilience [39,40]. This, in turn, could indirectly impact outcomes such as vertebral stability and the risk of SIVC.
Age has been reported as a significant risk factor for the occurrence of past VCFs [41,42]. In this study, the age parameter showed significant results in SETTING_1, but the age parameter in SETTING_2 did not show significant results compared to other variables. This means that muscle-related variables representing an individual’s health status are more important than chronological age in the fracture-healing process. Of course, age should not be overlooked in the occurrence of SIVC because there have been reports that age, osteoporosis, and sarcopenia are correlated [43]. However, it can be seen that it should be interpreted differently based on the meaning of risk factors mentioned in previous studies [36].
This study provides data-driven insights and valuable medical perspectives. With the development of computer and data science, grafting into the medical field is taking place. This provides an accurate identification and diagnosis of SIVC and can lead to the discovery of risk factors through data analysis.
Nevertheless, several limitations should be considered when interpreting our results. First, the retrospective nature of this study meant that standardized pain scores were not available throughout the follow-up period, potentially affecting our ability to fully characterize pain progression patterns. Second, while we carefully defined SIVC criteria, the distinction between SIVC-related pain and other sources of chronic pain relies partly on clinical judgment. Third, potential contributing factors such as the frailty index [44], BMI, and BMD were excluded due to missing data. Fourth, our focus on single-level fractures, while improving study homogeneity, limits generalizability to multilevel cases. Finally, the relatively small number of SIVC cases (n = 40) suggests the need for larger validation studies. Given these limitations, future studies should investigate the potential clinical application of muscle-related predictors in the early diagnosis and treatment of SIVC, further validating their role in improving patient outcomes.
5. Conclusions
This study identified paravertebral muscle variables as novel risk factors for SIVC and highlighted effective prediction methods. From a clinical perspective, the fatty infiltration of multifidus and erector spinae muscles represents a key biomarker for predicting SIVC occurrence, suggesting that the careful evaluation of muscle status should be integrated into routine MRI assessment. Machine learning models, particularly Random Forest, can effectively integrate complex clinical and radiological data to predict SIVC with high accuracy, offering potential as clinical decision support tools.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Watanabe K. Lenke L.G. Bridwell K.H. Kim Y.J. Koester L. Hensley M. Proximal junctional vertebral fracture in adults after spinal deformity surgery using pedicle screw constructs: Analysis of morphological features Spine 20103513814510.1097/BRS.0b 013e 3181 c 8f 35d 20081508 · doi ↗ · pubmed ↗
- 2Park W.M. Choi D.K. Kim K. Kim Y.J. Kim Y.H. Biomechanical effects of fusion levels on the risk of proximal junctional failure and kyphosis in lumbar spinal fusion surgery Clin. Biomech.2015301162116910.1016/j.clinbiomech.2015.08.00926320851 · doi ↗ · pubmed ↗
- 3Silverman S.L. The clinical consequences of vertebral compression fracture Bone 199213(Suppl. 2)S 27S 3110.1016/8756-3282(92)90193-z 1627411 · doi ↗ · pubmed ↗
- 4Heggeness M.H. Spine fracture with neurological deficit in osteoporosis Osteoporos. Int.1993321522110.1007/BF 016236798338978 · doi ↗ · pubmed ↗
- 5Baba H. Maezawa Y. Kamitani K. Furusawa N. Imura S. Tomita K. Osteoporotic vertebral collapse with late neurological complications Paraplegia 19953328128910.1038/sc.1995.647630656 · doi ↗ · pubmed ↗
- 6Sarli M. Perez Manghi F.C. Gallo R. Zanchetta J.R. The vacuum cleft sign: An uncommon radiological sign Osteoporos. Int.2005161210121410.1007/s 00198-005-1833-415731885 · doi ↗ · pubmed ↗
- 7Jang J.S. Kim D.Y. Lee S.H. Efficacy of percutaneous vertebroplasty in the treatment of intravertebral pseudarthrosis associated with noninfected avascular necrosis of the vertebral body Spine 2003281588159212865850 · pubmed ↗
- 8Osterhouse M.D. Kettner N.W. Delayed posttraumatic vertebral collapse with intravertebral vacuum cleft J. Manip. Physiol. Ther.20022527027510.1067/mmt.2002.12316412021746 · doi ↗ · pubmed ↗
