# Patient-Level Classification of Rotator Cuff Tears on Shoulder MRI Using an Explainable Vision Transformer Framework

**Authors:** Murat Aşçı, Sergen Aşık, Ahmet Yazıcı, İrfan Okumuşer

PMC · DOI: 10.3390/jcm15030928 · Journal of Clinical Medicine · 2026-01-23

## TL;DR

This paper introduces a new explainable AI framework for classifying rotator cuff tears in shoulder MRIs, improving accuracy and providing insights into the decision-making process.

## Contribution

The novel contribution is the development of Pa-ViT, an explainable vision transformer framework for patient-level classification of rotator cuff tears.

## Key findings

- The Pa-ViT model achieved 91% overall accuracy and a macro-averaged F1-score of 0.91.
- The model outperformed standard VGG-16 by 4% in accuracy and showed strong performance for partial-thickness tears.
- Attention visualizations confirmed the model's focus on relevant anatomical features like the supraspinatus footprint.

## Abstract

Background/Objectives: Diagnosing Rotator Cuff Tears (RCTs) via Magnetic Resonance Imaging (MRI) is clinically challenging due to complex 3D anatomy and significant interobserver variability. Traditional slice-centric Convolutional Neural Networks (CNNs) often fail to capture the necessary volumetric context for accurate grading. This study aims to develop and validate the Patient-Aware Vision Transformer (Pa-ViT), an explainable deep-learning framework designed for the automated, patient-level classification of RCTs (Normal, Partial-Thickness, and Full-Thickness). Methods: A large-scale retrospective dataset comprising 2447 T2-weighted coronal shoulder MRI examinations was utilized. The proposed Pa-ViT framework employs a Vision Transformer (ViT-Base) backbone within a Weakly-Supervised Multiple Instance Learning (MIL) paradigm to aggregate slice-level semantic features into a unified patient diagnosis. The model was trained using a weighted cross-entropy loss to address class imbalance and was benchmarked against widely used CNN architectures and traditional machine-learning classifiers. Results: The Pa-ViT model achieved a high overall accuracy of 91% and a macro-averaged F1-score of 0.91, significantly outperforming the standard VGG-16 baseline (87%). Notably, the model demonstrated superior discriminative power for the challenging Partial-Thickness Tear class (ROC AUC: 0.903). Furthermore, Attention Rollout visualizations confirmed the model’s reliance on genuine anatomical features, such as the supraspinatus footprint, rather than artifacts. Conclusions: By effectively modeling long-range dependencies, the Pa-ViT framework provides a robust alternative to traditional CNNs. It offers a clinically viable, explainable decision support tool that enhances diagnostic sensitivity, particularly for subtle partial-thickness tears.

## Full-text entities

- **Diseases:** Rotator Cuff Tears (MESH:D000070636)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12898537/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12898537/full.md

## References

29 references — full list in the complete paper: https://tomesphere.com/paper/PMC12898537/full.md

---
Source: https://tomesphere.com/paper/PMC12898537