CapsoNet: A CNN-Transformer Ensemble for Multi-Class Abnormality Detection in Video Capsule Endoscopy

Arnav Samal; Ranya Batsyas

arXiv:2410.18879·cs.CV·August 7, 2025

CapsoNet: A CNN-Transformer Ensemble for Multi-Class Abnormality Detection in Video Capsule Endoscopy

Arnav Samal, Ranya Batsyas

PDF

Open Access 1 Repo

TL;DR

CapsoNet is a novel deep learning ensemble combining CNNs and transformers for multi-class abnormality detection in video capsule endoscopy, achieving high accuracy and AUC in a competitive challenge.

Contribution

The paper introduces CapsoNet, a CNN-transformer ensemble specifically designed for VCE abnormality classification, with innovative training strategies to handle class imbalance.

Findings

01

Achieved 86.34% balanced accuracy on validation set.

02

Secured 5th place in Capsule Vision 2024 Challenge.

03

Demonstrated effectiveness of ensemble and data augmentation techniques.

Abstract

We present CapsoNet, a deep learning framework developed for the Capsule Vision 2024 Challenge, designed to perform multi-class abnormality classification in video capsule endoscopy (VCE) frames. CapsoNet leverages an ensemble of convolutional neural networks (CNNs) and transformer-based architectures to capture both local and global visual features. The model was trained and evaluated on a dataset of over 50,000 annotated frames spanning ten abnormality classes, sourced from three public and one private dataset. To address the challenge of class imbalance, we employed focal loss, weighted random sampling, and extensive data augmentation strategies. All models were fully fine-tuned to maximize performance within the ensemble. CapsoNet achieved a balanced accuracy of 86.34 percent and a mean AUC-ROC of 0.9908 on the official validation set, securing Team Seq2Cure 5th place in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

arnavs04/capsule-vision-2024
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGastrointestinal Bleeding Diagnosis and Treatment