# Development of a multimodal vision transformer model for predicting traumatic versus degenerative rotator cuff tears on magnetic resonance imaging: A single‐centre retrospective study

**Authors:** Felix C. Oettl, Ali B. Malayeri, Pascal R. Furrer, Karl Wieser, Philipp Fürnstahl, Samy Bouaicha

PMC · DOI: 10.1002/ksa.70000 · 2025-08-13

## TL;DR

This study explores using an AI model to distinguish between traumatic and degenerative rotator cuff tears in MRI scans, showing promising results.

## Contribution

The novel use of a multimodal vision transformer model to differentiate traumatic and degenerative rotator cuff tears on MRI is presented.

## Key findings

- The multimodal ViT model achieved 75% accuracy in differentiating traumatic and degenerative RCTs.
- Saliency maps did not consistently highlight the rotator cuff, suggesting other imaging features may be important.
- The model demonstrated robust generalization across patient subsets.

## Abstract

The differentiation between traumatic and degenerative rotator cuff tears (RCTs remains a diagnostic challenge with significant implications for treatment planning. While magnetic resonance imaging (MRI) is standard practice, traditional radiological interpretation has shown limited reliability in distinguishing these etiologies. This study evaluates the potential of artificial intelligence (AI) models, specifically a multimodal vision transformer (ViT), to differentiate between traumatic and degenerative RCT.

In this retrospective, single‐centre study, 99 shoulder MRIs were analysed from patients who underwent surgery at a specialised university shoulder unit between 2016 and 2019. The cohort was divided into training (n = 79) and validation (n = 20) sets. The traumatic group required a documented relevant trauma (excluding simple lifting injuries), previously asymptomatic shoulder and MRI within 3 months posttrauma. The degenerative group was of similar age and injured tendon, with patients presenting with at least 1 year of constant shoulder pain prior to imaging and no trauma history. The ViT was subsequently combined with demographic data to finalise in a multimodal ViT. Saliency maps are utilised as an explainability tool.

The multimodal ViT model achieved an accuracy of 0.75 ± 0.08 with a recall of 0.8 ± 0.08, specificity of 0.71 ± 0.11 and a F1 score of 0.76 ± 0.1. The model maintained consistent performance across different patient subsets, demonstrating robust generalisation. Saliency maps do not show a consistent focus on the rotator cuff.

AI shows potential in supporting the challenging differentiation between traumatic and degenerative RCT on MRI. The achieved accuracy of 75% is particularly significant given the similar groups which presented a challenging diagnostic scenario. Saliency maps were utilised to ensure explainability, the given lack of consistent focus on rotator cuff tendons hints towards underappreciated aspects in the differentiation.

Not applicable.

## Full-text entities

- **Diseases:** shoulder pain (MESH:D020069), rotator cuff tears (MESH:D000070636), injured tendon (MESH:D052256), trauma (MESH:D014947)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12582230/full.md

---
Source: https://tomesphere.com/paper/PMC12582230