TL;DR
EndoCaver is a lightweight transformer model that jointly performs deblurring and segmentation of endoscopic images, improving accuracy under challenging conditions while being suitable for on-device clinical use.
Contribution
It introduces a novel joint deblurring-segmentation transformer architecture with reduced complexity and state-of-the-art performance on endoscopic image datasets.
Findings
Achieves 0.922 Dice on clean data and 0.889 under severe degradation
Reduces model parameters by 90% compared to previous methods
Outperforms state-of-the-art in robustness and efficiency
Abstract
Endoscopic image analysis is vital for colorectal cancer screening, yet real-world conditions often suffer from lens fogging, motion blur, and specular highlights, which severely compromise automated polyp detection. We propose EndoCaver, a lightweight transformer with a unidirectional-guided dual-decoder architecture, enabling joint multi-task capability for image deblurring and segmentation while significantly reducing computational complexity and model parameters. Specifically, it integrates a Global Attention Module (GAM) for cross-scale aggregation, a Deblurring-Segmentation Aligner (DSA) to transfer restoration cues, and a cosine-based scheduler (LoCoS) for stable multi-task optimisation. Experiments on the Kvasir-SEG dataset show that EndoCaver achieves 0.922 Dice on clean data and 0.889 under severe image degradation, surpassing state-of-the-art methods while reducing model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis · Colorectal Cancer Screening and Detection
