HiFormer: Hierarchical Multi-scale Representations Using Transformers   for Medical Image Segmentation

Moein Heidari; Amirhossein Kazerouni; Milad Soltany; Reza Azad; Ehsan; Khodapanah Aghdam; Julien Cohen-Adad; Dorit Merhof

arXiv:2207.08518·cs.CV·January 10, 2023

HiFormer: Hierarchical Multi-scale Representations Using Transformers for Medical Image Segmentation

Moein Heidari, Amirhossein Kazerouni, Milad Soltany, Reza Azad, Ehsan, Khodapanah Aghdam, Julien Cohen-Adad, Dorit Merhof

PDF

1 Repo

TL;DR

HiFormer introduces a hybrid approach combining CNNs and transformers with multi-scale feature representations and a novel fusion module, significantly improving medical image segmentation accuracy and efficiency.

Contribution

It presents a new hybrid model that effectively integrates CNN and transformer features for dense medical image segmentation tasks.

Findings

01

Outperforms existing CNN, transformer, and hybrid methods in accuracy

02

Reduces computational complexity compared to other models

03

Achieves superior qualitative segmentation results

Abstract

Convolutional neural networks (CNNs) have been the consensus for medical image segmentation tasks. However, they suffer from the limitation in modeling long-range dependencies and spatial correlations due to the nature of convolution operation. Although transformers were first developed to address this issue, they fail to capture low-level features. In contrast, it is demonstrated that both local and global features are crucial for dense prediction, such as segmenting in challenging contexts. In this paper, we propose HiFormer, a novel method that efficiently bridges a CNN and a transformer for medical image segmentation. Specifically, we design two multi-scale feature representations using the seminal Swin Transformer module and a CNN-based encoder. To secure a fine fusion of global and local features obtained from the two aforementioned representations, we propose a Double-Level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amirhossein-kz/hiformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Dropout · Byte Pair Encoding · Adam · Residual Connection · Label Smoothing · Position-Wise Feed-Forward Layer