RARE disease detection from Capsule Endoscopic Videos based on Vision Transformers
X. Gao, C. Chien, G. Liu, A. Manullang

TL;DR
This paper applies fine-tuned Vision Transformers to multi-label classification of capsule endoscopic videos for rare disease detection, achieving initial promising results in a challenging medical imaging task.
Contribution
It introduces the use of Vision Transformers for multi-label classification in capsule endoscopy videos, a novel approach in this medical domain.
Findings
Overall mAP @0.5 is 0.0205
Overall mAP @0.95 is 0.0196
Demonstrates feasibility of Transformer-based models in medical video analysis
Abstract
This work is corresponding to the Gastro Competition for multi-label classification from capsule endoscopic videos (CEV). Deep learning network based on Transformers are fined-tune for this task. The based online mode is Google Vision Transformer (ViT) batch16 with 224 x 224 resolutions. In total, 17 labels are classified, which are mouth, esophagus, stomach, small intestine, colon, z-line, pylorus, ileocecal valve, active bleeding, angiectasia, blood, erosion, erythema, hematin, lymphangioectasis, polyp, and ulcer. For test dataset of three videos, the overall mAP @0.5 is 0.0205 whereas the overall mAP @0.95 is 0.0196.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGastrointestinal Bleeding Diagnosis and Treatment · Colorectal Cancer Screening and Detection · Bariatric Surgery and Outcomes
