ACM Multimedia Grand Challenge on ENT Endoscopy Analysis

Trong-Thuan Nguyen; Viet-Tham Huynh; Thao Thi Phuong Dao; Ha Nguyen Thi; Tien To Vu Thuy; Uyen Hanh Tran; Tam V. Nguyen; Thanh Dinh Le; Minh-Triet Tran

arXiv:2508.04801·cs.CV·August 8, 2025

ACM Multimedia Grand Challenge on ENT Endoscopy Analysis

Trong-Thuan Nguyen, Viet-Tham Huynh, Thao Thi Phuong Dao, Ha Nguyen Thi, Tien To Vu Thuy, Uyen Hanh Tran, Tam V. Nguyen, Thanh Dinh Le, Minh-Triet Tran

PDF

TL;DR

This paper introduces ENTRep, a comprehensive benchmark for ENT endoscopy analysis that combines anatomical classification with retrieval tasks, supported by a bilingual dataset and standardized evaluation protocols.

Contribution

It presents the first public benchmark dataset and tasks for ENT endoscopy analysis, integrating classification and retrieval with bilingual annotations and expert labels.

Findings

01

Top-performing teams achieved significant accuracy in classification tasks.

02

The benchmark revealed challenges in fine-grained anatomical differentiation.

03

Bilingual descriptions improved retrieval performance.

Abstract

Automated analysis of endoscopic imagery is a critical yet underdeveloped component of ENT (ear, nose, and throat) care, hindered by variability in devices and operators, subtle and localized findings, and fine-grained distinctions such as laterality and vocal-fold state. In addition to classification, clinicians require reliable retrieval of similar cases, both visually and through concise textual descriptions. These capabilities are rarely supported by existing public benchmarks. To this end, we introduce ENTRep, the ACM Multimedia 2025 Grand Challenge on ENT endoscopy analysis, which integrates fine-grained anatomical classification with image-to-image and text-to-image retrieval under bilingual (Vietnamese and English) clinical supervision. Specifically, the dataset comprises expert-annotated images, labeled for anatomical region and normal or abnormal status, and accompanied by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.