CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection
Hongyi Cai, Mohammad Mahdinur Rahman, Wenzhen Dong, Jingyu Wu

TL;DR
CFPFormer introduces a novel transformer-based decoder with feature pyramids and Gaussian Attention, improving medical image segmentation by capturing long-range dependencies efficiently.
Contribution
The paper proposes CFPFormer, a new decoder architecture that enhances feature extraction in segmentation models while reducing complexity through Gaussian Attention.
Findings
Achieved 92.02% Dice Score on medical datasets.
Outperformed more complex ViT and Swin Transformer baselines.
Demonstrated effectiveness of the proposed decoder architecture.
Abstract
Feature pyramids have been widely adopted in convolutional neural networks and transformers for tasks in medical image segmentation. However, existing models generally focus on the Encoder-side Transformer for feature extraction. We further explore the potential in improving the feature decoder with a well-designed architecture. We propose Cross Feature Pyramid Transformer decoder (CFPFormer), a novel decoder block that integrates feature pyramids and transformers. Even though transformer-like architecture impress with outstanding performance in segmentation, the concerns to reduce the redundancy and training costs still exist. Specifically, by leveraging patch embedding, cross-layer feature concatenation mechanisms, CFPFormer enhances feature extraction capabilities while complexity issue is mitigated by our Gaussian Attention. Benefiting from Transformer structure and U-shaped…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · CCD and CMOS Imaging Sensors
MethodsAttention Is All You Need · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Dropout · Dense Connections · Label Smoothing · Residual Connection · Softmax · Adam
