Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers
Bo Dong, Wenhai Wang, Deng-Ping Fan, Jinpeng Li, Huazhu Fu, Ling Shao

TL;DR
Polyp-PVT introduces a transformer-based model with specialized modules for improved polyp segmentation, outperforming CNN-based methods in robustness and accuracy across multiple datasets.
Contribution
The paper proposes a novel transformer encoder architecture with three modules for enhanced feature fusion in polyp segmentation, addressing limitations of CNN-based approaches.
Findings
Outperforms existing methods on five datasets
Robust to appearance changes, small objects, and rotation
Effectively fuses cross-level features for better segmentation
Abstract
Most polyp segmentation methods use CNNs as their backbone, leading to two key issues when exchanging information between the encoder and decoder: 1) taking into account the differences in contribution between different-level features and 2) designing an effective mechanism for fusing these features. Unlike existing CNN-based methods, we adopt a transformer encoder, which learns more powerful and robust representations. In addition, considering the image acquisition influence and elusive properties of polyps, we introduce three standard modules, including a cascaded fusion module (CFM), a camouflage identification module (CIM), and a similarity aggregation module (SAM). Among these, the CFM is used to collect the semantic and location information of polyps from high-level features; the CIM is applied to capture polyp information disguised in low-level features, and the SAM extends the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Vehicle License Plate Recognition
