TL;DR
HiFormer introduces a hybrid approach combining CNNs and transformers with multi-scale feature representations and a novel fusion module, significantly improving medical image segmentation accuracy and efficiency.
Contribution
It presents a new hybrid model that effectively integrates CNN and transformer features for dense medical image segmentation tasks.
Findings
Outperforms existing CNN, transformer, and hybrid methods in accuracy
Reduces computational complexity compared to other models
Achieves superior qualitative segmentation results
Abstract
Convolutional neural networks (CNNs) have been the consensus for medical image segmentation tasks. However, they suffer from the limitation in modeling long-range dependencies and spatial correlations due to the nature of convolution operation. Although transformers were first developed to address this issue, they fail to capture low-level features. In contrast, it is demonstrated that both local and global features are crucial for dense prediction, such as segmenting in challenging contexts. In this paper, we propose HiFormer, a novel method that efficiently bridges a CNN and a transformer for medical image segmentation. Specifically, we design two multi-scale feature representations using the seminal Swin Transformer module and a CNN-based encoder. To secure a fine fusion of global and local features obtained from the two aforementioned representations, we propose a Double-Level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Dropout · Byte Pair Encoding · Adam · Residual Connection · Label Smoothing · Position-Wise Feed-Forward Layer
