CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and   Detection

Hongyi Cai; Mohammad Mahdinur Rahman; Wenzhen Dong; Jingyu Wu

arXiv:2404.15451·cs.CV·April 8, 2025·1 cites

CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection

Hongyi Cai, Mohammad Mahdinur Rahman, Wenzhen Dong, Jingyu Wu

PDF

Open Access

TL;DR

CFPFormer introduces a novel transformer-based decoder with feature pyramids and Gaussian Attention, improving medical image segmentation by capturing long-range dependencies efficiently.

Contribution

The paper proposes CFPFormer, a new decoder architecture that enhances feature extraction in segmentation models while reducing complexity through Gaussian Attention.

Findings

01

Achieved 92.02% Dice Score on medical datasets.

02

Outperformed more complex ViT and Swin Transformer baselines.

03

Demonstrated effectiveness of the proposed decoder architecture.

Abstract

Feature pyramids have been widely adopted in convolutional neural networks and transformers for tasks in medical image segmentation. However, existing models generally focus on the Encoder-side Transformer for feature extraction. We further explore the potential in improving the feature decoder with a well-designed architecture. We propose Cross Feature Pyramid Transformer decoder (CFPFormer), a novel decoder block that integrates feature pyramids and transformers. Even though transformer-like architecture impress with outstanding performance in segmentation, the concerns to reduce the redundancy and training costs still exist. Specifically, by leveraging patch embedding, cross-layer feature concatenation mechanisms, CFPFormer enhances feature extraction capabilities while complexity issue is mitigated by our Gaussian Attention. Benefiting from Transformer structure and U-shaped…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing Techniques and Applications · CCD and CMOS Imaging Sensors

MethodsAttention Is All You Need · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Dropout · Dense Connections · Label Smoothing · Residual Connection · Softmax · Adam