Multi-Class Abnormality Classification Task in Video Capsule Endoscopy
Dev Rishi Verma, Vibhor Saxena, Dhruv Sharma, Arpan Gupta

TL;DR
This paper presents a deep learning approach using various transformer architectures to classify multiple gastrointestinal disorders in video capsule endoscopy, achieving state-of-the-art accuracy and ranking in the Capsule Vision Challenge 2024.
Contribution
It introduces the application of multiscale and attention-based transformer models for multiclass anomaly classification in VCE, surpassing previous methods in accuracy.
Findings
Best validation accuracy of 0.8592 and Mean AUC of 0.9932 achieved.
Top 7th place in Capsule Vision Challenge 2024 with test set AUC of 0.7314.
Transformer-based models significantly improve classification performance.
Abstract
In this work for Capsule Vision Challenge 2024, we addressed the challenge of multiclass anomaly classification in video capsule Endoscopy (VCE)[1] with a variety of deep learning models, ranging from custom CNNs to advanced transformer architectures. The purpose is to correctly classify diverse gastrointestinal disorders, which is critical for increasing diagnostic efficiency in clinical settings. We started with a baseline CNN model and improved performance with ResNet[2] for better feature extraction, followed by Vision Transformer (ViT)[3] to capture global dependencies. We further improve the results by using Multiscale Vision Transformer (MViT)[4] for improved hierarchical feature extraction, while Dual Attention Vision Transformer (DaViT) [5] delivered best results by combining spatial and channel attention methods. Our best balanced accuracy on validation set [6] was 0.8592 and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGastrointestinal Bleeding Diagnosis and Treatment
MethodsAttention Is All You Need · Sparse Evolutionary Training · Linear Layer · Label Smoothing · Byte Pair Encoding · Multi-Head Attention · Softmax · Adam · Dropout · Absolute Position Encodings
