FEFormer: Frequency-enhanced Vision Transformer for Generic Knowledge Extraction and Adaptive Feature Fusion in Volumetric Medical Image Segmentation
Jin Yang, Xiaobing Yu, Peijie Qiu

TL;DR
FEFormer is a novel frequency-enhanced vision transformer designed for volumetric medical image segmentation, explicitly modeling frequency information to improve local detail capture and global context understanding.
Contribution
It introduces four innovative modules that incorporate frequency domain processing into transformer architecture for better medical image segmentation.
Findings
Achieved superior segmentation accuracy on four medical datasets.
Demonstrated high computational efficiency compared to existing methods.
Effectively captures both local details and global context through frequency modeling.
Abstract
Accurate segmentation of organs and lesions in medical images is essential for clinical applications including diagnosis, prognosis, and treatment planning. While Vision Transformers (ViTs) have shown impressive segmentation performance, they face key challenges in module and architecture design. Specifically, self-attention struggles to capture fine-grained local features critical for understanding detailed anatomical structures, standard MLP modules lack explicit mechanisms to preserve spatial information, conventional encoder-decoder architectures rely on naive feature fusion strategies that cannot handle large semantic discrepancies, and existing designs lack explicit mechanisms to propagate low-level information from encoder to decoder. To address these limitations, we propose a Frequency-enhanced Vision Transformer (FEFormer) for robust and efficient volumetric medical image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
