# Multi-Task Deep Learning Model for Automated Detection and Severity Grading of Lumbar Spinal Stenosis on MRI: Multi-Center External Validation

**Authors:** Phatcharapon Udomluck, Watcharaporn Cholamjiak, Jakkaphong Inpun, Waragunt Waratamrongpatai

PMC · DOI: 10.3390/diseases14010032 · Diseases · 2026-01-14

## TL;DR

This paper evaluates deep learning models for automatically grading lumbar spinal stenosis severity on MRI scans, showing strong performance with VGG19 features and logistic regression.

## Contribution

The study demonstrates that VGG19-based deep learning models with classical classifiers offer robust and generalizable LSS grading across external MRI datasets.

## Key findings

- VGG19-based features achieved highest accuracy (0.9556) and F1-score (0.9558) in external validation.
- DINOv2 features showed reduced generalizability, especially with LightGBM (accuracy 0.6222).
- Most classification errors occurred between adjacent severity grades.

## Abstract

Background/Objectives: Accurate and reproducible grading of lumbar spinal stenosis (LSS) is clinically critical for guiding treatment decisions and patient management, yet manual assessment remains challenging due to imaging variability and inter-observer subjectivity. To address these limitations, this study aimed to evaluate the generalizability of deep learning–based feature extraction methods—VGG19, ConvNeXt-Tiny, and DINOv2—combined with classical machine learning classifiers for automated multi-grade LSS assessment. Automated grading enables objective, reproducible, and scalable assessment of lumbar spinal stenosis severity, addressing key limitations of manual interpretation. Methods: Axial MRI images were processed using pretrained VGG19, ConvNeXt-Tiny, and DINOv2 models to extract deep features. Logistic Regression, Support Vector Machine (SVM), and LightGBM were trained on internal datasets and externally validated using MRI data from the University of Phayao Hospital. Performance was assessed using accuracy, precision, recall, F1-score, confusion matrices, and multi-class ROC curves. Results: VGG19-based features yielded the strongest external performance, with Logistic Regression achieving the highest accuracy (0.9556) and F1-score (0.9558). External validation further demonstrated excellent discrimination, with AUC values ranging from 0.994 to 1.000 across all severity grades. SVM (0.9333 accuracy) and LightGBM (0.9222 accuracy) also performed well. ConvNeXt-Tiny showed stable cross-model performance, while DINOv2 features exhibited reduced generalizability, especially with LightGBM (accuracy 0.6222). Most classification errors occurred between adjacent grades. Conclusions: Deep convolutional features—particularly VGG19—combined with classical machine learning classifiers provide robust and generalizable LSS grading across external MRI data. Despite advances in modern architectures, CNN-based feature extraction remains highly effective for spinal imaging and represents a practical pathway for clinical decision support.

## Linked entities

- **Diseases:** lumbar spinal stenosis (MONDO:0005965)

## Full-text entities

- **Diseases:** LSS (MESH:C563613)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12839941/full.md

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12839941/full.md

## References

50 references — full list in the complete paper: https://tomesphere.com/paper/PMC12839941/full.md

---
Source: https://tomesphere.com/paper/PMC12839941