# A Trustable Spine Abnormalities Classification System Using ResNet50 and VGG16 Supported by Explainable Artificial Intelligence

**Authors:** Muhammad Shahrul Zaim Ahmad, Nor Azlina Ab. Aziz, Heng Siong Lim, Anith Khairunnisa Ghazali, Mubashir Ahmad, Farshid Amirabdollahian, Afif Abdul Latiff, Kamarulzaman Ab. Aziz

PMC · DOI: 10.3390/biomimetics11030206 · Biomimetics · 2026-03-12

## TL;DR

This paper presents a deep learning system for classifying spine abnormalities using ResNet50 and VGG16, with explainable AI to improve trust in medical diagnostics.

## Contribution

The study introduces a trustable spine classification system using explainable AI methods like Grad-CAM to align model decisions with clinical diagnostics.

## Key findings

- Fine-tuned ResNet50 and VGG16 achieved high classification accuracies of 98.22% and 99.12%, respectively.
- ResNet50's Grad-CAM heatmaps showed better alignment with clinically relevant regions compared to VGG16.

## Abstract

Deep learning has been applied in various fields and has been proven to provide good results for classification tasks. However, there is limited understanding of a deep learning model’s decisions, so deep learning is commonly described as a black box. Applying deep learning for critical applications such as medical diagnostic process introduces trust issues. For the deep learning model to be trusted by the medical practitioners, the methods employed by the deep learning model must be seen to be aligned with the diagnostic process employed by the medical practitioners. Explainable methods such as Grad-CAM can be applied to improve the explainability of the deep learning models by providing an visual interpretation of the deep learning classification result decision process. In this study, two deep learning models, VGG16 and ResNet50 are trained using three training methods, one with randomly initialized weights, and two transfer learning methods, which are feature extraction and fine-tuning, to classify the spinal abnormalities based on X-ray images. The classification metrics results are compared and further analyses using Grad-CAM heatmaps are included. The models also evaluated using a stratified five-fold cross-validation, results revealed some disparity between the model’s accuracy and clinical relevance. The randomly initialized VGG16 obtained a classification accuracy of 93.79% but does not focus on clinically relevant regions. On the other hand, not only do the fine-tuned ResNet50 and VGG16 obtain high accuracies of 98.22% and 99.12%, but the heatmaps show that the models focus on more relevant regions. A comparison of the two models shows that the heatmaps produced by the fine-tuned ResNet50 are in more agreement with the clinical view than the fine-tuned VGG16. This study provides a useful reference for interpreting a deep learning-based classification result using explainable method particularly in spine abnormalities analysis with Grad-CAM.

## Full-text entities

- **Genes:** CALM3 (calmodulin 3) [NCBI Gene 808] {aka CALM, CAM1, CAM2, CAMB, CPVT6, CaM}
- **Diseases:** colon cancer (MESH:D015179), spinal abnormalities (MESH:D016472), lung (MESH:D008171), colon (MESH:D003108), Alzheimer's disease (MESH:D000544), XAI (MESH:C538243), tilted disk (MESH:D055959), osteoporosis (MESH:D010024), cancer (MESH:D009369), Scoliosis (MESH:D012600), lung cancer (MESH:D008175), Spondylolisthesis (MESH:D013168), hemorrhage (MESH:D006470), injury to (MESH:D014947), lumbar spine herniation (MESH:C535531), Spine Abnormalities (MESH:D016135), AI (MESH:C538142)
- **Chemicals:** VGG16 (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13024192/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13024192/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/PMC13024192/full.md

---
Source: https://tomesphere.com/paper/PMC13024192