# Automated overview of complete endoscopies with unsupervised learned descriptors

**Authors:** O. Leon Barbed, Pablo Azagra, Juan Plo, Ana C. Murillo

PMC · DOI: 10.1007/s11548-025-03502-1 · International Journal of Computer Assisted Radiology and Surgery · 2025-08-27

## TL;DR

This paper introduces an automated method to identify and highlight relevant parts of endoscopy videos, helping clinicians review procedures more efficiently.

## Contribution

A novel unsupervised approach for endoscopy video analysis using learned embeddings and clustering to identify key visual patterns.

## Key findings

- The method effectively identifies surgery segments and visibility conditions in colonoscopy videos.
- Structured overviews are generated, separating informative from non-informative video parts.
- The approach improves preprocessing for downstream tasks like 3D reconstruction and video summarization.

## Abstract

We aim to automate the initial analysis of complete endoscopy videos, identifying the sparse relevant content. This facilitates long procedure recording understanding, reduces the clinicians’ review time, and facilitates downstream tasks such as video summarization, event detection, and 3D reconstruction.

Our approach extracts endoscopic video frame representations with a learned embedding model. These descriptors are clustered to find visual patterns in the procedure, identifying key scene types (surgery, clear visibility frames, etc.) and enabling segmentation into informative and non-informative video parts.

Evaluation on complete colonoscopy videos presents good performance identifying surgery segments and different visibility conditions. The method produces structured overviews that separate useful segments from irrelevant ones. We illustrate its suitability and benefits as preprocessing for other downstream tasks, such as 3D reconstruction or video summarization.

Our approach enables automated endoscopy overview generation, helping the clinicians focus on the relevant video content such as good visibility sections and surgery actions. The presented work facilitates faster recording reviewing for clinicians and effective video preprocessing for downstream tasks.

The online version contains supplementary material available at 10.1007/s11548-025-03502-1.

## Full-text entities

- **Diseases:** dysplasia (MESH:D015792), polyp (MESH:D011127), hemorrhage (MESH:D006470)
- **Chemicals:** water (MESH:D014867)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13035558/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13035558/full.md

## References

2 references — full list in the complete paper: https://tomesphere.com/paper/PMC13035558/full.md

---
Source: https://tomesphere.com/paper/PMC13035558