RARE disease detection from Capsule Endoscopic Videos based on Vision Transformers

X. Gao; C. Chien; G. Liu; A. Manullang

arXiv:2603.18045·cs.CV·March 20, 2026

RARE disease detection from Capsule Endoscopic Videos based on Vision Transformers

X. Gao, C. Chien, G. Liu, A. Manullang

PDF

Open Access

TL;DR

This paper applies fine-tuned Vision Transformers to multi-label classification of capsule endoscopic videos for rare disease detection, achieving initial promising results in a challenging medical imaging task.

Contribution

It introduces the use of Vision Transformers for multi-label classification in capsule endoscopy videos, a novel approach in this medical domain.

Findings

01

Overall mAP @0.5 is 0.0205

02

Overall mAP @0.95 is 0.0196

03

Demonstrates feasibility of Transformer-based models in medical video analysis

Abstract

This work is corresponding to the Gastro Competition for multi-label classification from capsule endoscopic videos (CEV). Deep learning network based on Transformers are fined-tune for this task. The based online mode is Google Vision Transformer (ViT) batch16 with 224 x 224 resolutions. In total, 17 labels are classified, which are mouth, esophagus, stomach, small intestine, colon, z-line, pylorus, ileocecal valve, active bleeding, angiectasia, blood, erosion, erythema, hematin, lymphangioectasis, polyp, and ulcer. For test dataset of three videos, the overall mAP @0.5 is 0.0205 whereas the overall mAP @0.95 is 0.0196.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGastrointestinal Bleeding Diagnosis and Treatment · Colorectal Cancer Screening and Detection · Bariatric Surgery and Outcomes