# Unsupervised Segmentation of Bolus and Residue in Videofluoroscopy Swallowing Studies

**Authors:** Farnaz Khodami, Mehdy Dousty, James L. Coyle, Ervin Sejdić

PMC · DOI: 10.3390/jimaging11100368 · Journal of Imaging · 2025-10-17

## TL;DR

This paper introduces an unsupervised machine learning method to detect bolus and residue in swallowing studies, achieving strong performance without needing detailed annotations.

## Contribution

The first unsupervised machine learning model for bolus and residue segmentation in swallowing studies with quantitative evaluation.

## Key findings

- The model achieves 61% IoU for bolus segmentation and 52% for residue detection.
- The method outperforms supervised baselines in residue detection despite not using pixel-level labels.
- Positional encoding helps capture global spatial context, improving segmentation accuracy.

## Abstract

Bolus tracking is a critical component of swallowing analysis, as the speed, course, and integrity of bolus movement from the mouth to the stomach, along with the presence of residue, serve as key indicators of potential abnormalities. Existing machine learning approaches for videofluoroscopic swallowing study (VFSS) analysis heavily rely on annotated data and often struggle to detect residue, which is visually subtle and underrepresented. This study proposes an unsupervised architecture to segment both bolus and residue, marking the first successful machine learning-based residue segmentation in swallowing analysis with quantitative evaluation. We introduce an unsupervised convolutional autoencoder that segments bolus and residue without requiring pixel-level annotations. To address the locality bias inherent in convolutional architectures, we incorporate positional encoding into the input representation, enabling the model to capture global spatial context. The proposed model was validated on a diverse set of VFSS images annotated by certified raters. Our method achieves an intersection over union (IoU) of 61% for bolus segmentation—comparable to state-of-the-art supervised methods—and 52% for residue detection. Despite not using pixel-wise labels for training, our model significantly outperforms top-performing supervised baselines in residue detection, as confirmed by statistical testing. These findings suggest that learning from negative space provides a robust and generalizable pathway for detecting clinically significant but sparsely represented features like residue.

## Full-text entities

- **Diseases:** neurodegenerative conditions (MESH:D019636), aspiration pneumonia (MESH:D011015), head and neck cancers (MESH:D006258), structural abnormalities (MESH:C566527), injury to (MESH:D014947), dehydration (MESH:D003681), malnutrition (MESH:D044342), Swallowing difficulties (MESH:D003680), neurological disorders (MESH:D009461), PE (MESH:C564021), neuromuscular disorders (MESH:D009468)
- **Chemicals:** barium (MESH:D001464)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12564947/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12564947/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/PMC12564947/full.md

---
Source: https://tomesphere.com/paper/PMC12564947