# Improving image-retrieval performance of foundation models in gastrointestinal endoscopic images

**Authors:** Kangsan Kim, Junseok Park, Sang Hyun Kim, Youngbae Hwang

PMC · DOI: 10.3389/fmed.2025.1727884 · Frontiers in Medicine · 2025-12-18

## TL;DR

This paper introduces a new image-retrieval system for gastrointestinal endoscopy that improves accuracy and efficiency using a dual-model approach.

## Contribution

A novel dual-backbone framework combining a general vision model and a domain-specific endoscopic model for improved image retrieval.

## Key findings

- The model achieves state-of-the-art performance with 97.71% Recall@1 and 99.14% Recall@5.
- The dual-backbone design captures complementary features, leading to better performance than single-model baselines.
- The framework is validated on both real-world and synthetic endoscopic data.

## Abstract

The quality of gastrointestinal endoscopy is verified by documenting specific required images, but identifying these images from the numerous photographs captured during a procedure is tedious. Conventional deep-learning approaches that aim to automate this process are often limited by subjective assessments and poor interpretability.

We introduce a novel content-based image-retrieval framework that employs a dual-backbone architecture, integrating a general-purpose vision foundation model (DINOv2) and a domain-specific endoscopic model (GastroNet). The system is trained using parameter-efficient metric learning to generate discriminative embeddings for efficient similarity searches. The framework is evaluated on 3,500 public endoscopic images (from the Kvasir and HyperKvasir datasets) and validated on entirely unseen real-world and synthetic data.

Our model achieves state-of-the-art performance (97.71% Recall@1, 99.14% Recall@5, and 96.74% mean average precision), which is significantly superior to those of single-backbone baseline models. Ablation studies confirm that this improvement is primarily due to the two backbones capturing complementary features.

These findings demonstrate that the proposed dual-backbone framework offers an accurate and automated tool for assessing the procedural quality of gastrointestinal endoscopy and may facilitate more reliable quality control in clinical practice.

## Full-text entities

- **Diseases:** polyps (MESH:D011127), ulcerative colitis (MESH:D003093), Esophagitis (MESH:D004941), PEFT (MESH:C566019), inflammatory (MESH:D007249), gastrointestinal diseases (MESH:D005767)
- **Chemicals:** DINOv2 (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12756140/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12756140/full.md

## References

48 references — full list in the complete paper: https://tomesphere.com/paper/PMC12756140/full.md

---
Source: https://tomesphere.com/paper/PMC12756140