Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers

Bo Dong; Wenhai Wang; Deng-Ping Fan; Jinpeng Li; Huazhu Fu; Ling Shao

arXiv:2108.06932·eess.IV·February 20, 2024·170 cites

Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers

Bo Dong, Wenhai Wang, Deng-Ping Fan, Jinpeng Li, Huazhu Fu, Ling Shao

PDF

Open Access 2 Repos

TL;DR

Polyp-PVT introduces a transformer-based model with specialized modules for improved polyp segmentation, outperforming CNN-based methods in robustness and accuracy across multiple datasets.

Contribution

The paper proposes a novel transformer encoder architecture with three modules for enhanced feature fusion in polyp segmentation, addressing limitations of CNN-based approaches.

Findings

01

Outperforms existing methods on five datasets

02

Robust to appearance changes, small objects, and rotation

03

Effectively fuses cross-level features for better segmentation

Abstract

Most polyp segmentation methods use CNNs as their backbone, leading to two key issues when exchanging information between the encoder and decoder: 1) taking into account the differences in contribution between different-level features and 2) designing an effective mechanism for fusing these features. Unlike existing CNN-based methods, we adopt a transformer encoder, which learns more powerful and robust representations. In addition, considering the image acquisition influence and elusive properties of polyps, we introduce three standard modules, including a cascaded fusion module (CFM), a camouflage identification module (CIM), and a similarity aggregation module (SAM). Among these, the CFM is used to collect the semantic and location information of polyps from high-level features; the CIM is applied to capture polyp information disguised in low-level features, and the SAM extends the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Vehicle License Plate Recognition