A Robust Pipeline for Classification and Detection of Bleeding Frames in Wireless Capsule Endoscopy using Swin Transformer and RT-DETR
Sasidhar Alavala, Anil Kumar Vadde, Aparnamala Kancheti, Subrahmanyam, Gorthi

TL;DR
This paper introduces a robust pipeline combining Swin Transformer and RT-DETR, with advanced image preprocessing, to improve classification and detection of bleeding frames in Wireless Capsule Endoscopy, achieving state-of-the-art accuracy.
Contribution
The novel integration of Swin Transformer and RT-DETR with tailored preprocessing steps significantly enhances bleeding frame detection in WCE images.
Findings
Achieved 98.5% classification accuracy on validation set.
Improved AP50 to 66.7% over previous models.
On test set, attained 87.0% accuracy and 89.0% F1 score.
Abstract
In this paper, we present our approach to the Auto WCEBleedGen Challenge V2 2024. Our solution combines the Swin Transformer for the initial classification of bleeding frames and RT-DETR for further detection of bleeding in Wireless Capsule Endoscopy (WCE), enhanced by a series of image preprocessing steps. These steps include converting images to Lab colour space, applying Contrast Limited Adaptive Histogram Equalization (CLAHE) for better contrast, and using Gaussian blur to suppress artefacts. The Swin Transformer utilizes a tiered architecture with shifted windows to efficiently manage self-attention calculations, focusing on local windows while enabling cross-window interactions. RT-DETR features an efficient hybrid encoder for fast processing of multi-scale features and an uncertainty-minimal query selection for enhanced accuracy. The class activation maps by Ablation-CAM are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGastrointestinal Bleeding Diagnosis and Treatment
MethodsYou Only Look Once · Residual Connection · Softmax · Layer Normalization · Stochastic Depth · Byte Pair Encoding · Label Smoothing · Adam · Swin Transformer · Attention Is All You Need
