Unlocking Generalization in Polyp Segmentation with DINO Self-Attention "keys"
Carla Monteiro, Valentina Corbetta, Regina Beets-Tan, Lu\'is F. Teixeira, Wilson Silva

TL;DR
This paper introduces a novel polyp segmentation framework leveraging DINO self-attention key features, significantly improving generalization and performance in data-scarce and challenging clinical scenarios.
Contribution
The approach uniquely utilizes DINO self-attention keys with a simple decoder, achieving state-of-the-art results without task-specific architectures.
Findings
Outperforms existing models like nnU-Net and UM-Net
Enhances generalization in data-scarce settings
Provides a benchmark of DINO architecture evolution
Abstract
Automatic polyp segmentation is crucial for improving the clinical identification of colorectal cancer (CRC). While Deep Learning (DL) techniques have been extensively researched for this problem, current methods frequently struggle with generalization, particularly in data-constrained or challenging settings. Moreover, many existing polyp segmentation methods rely on complex, task-specific architectures. To address these limitations, we present a framework that leverages the intrinsic robustness of DINO self-attention "key" features for robust segmentation. Unlike traditional methods that extract tokens from the deepest layers of the Vision Transformer (ViT), our approach leverages the key features of the self-attention module with a simple convolutional decoder to predict polyp masks, resulting in enhanced performance and better generalizability. We validate our approach using a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsColorectal Cancer Screening and Detection · Advanced Neural Network Applications · AI in cancer detection
