HartleyMHA: Self-Attention in Frequency Domain for Resolution-Robust and   Parameter-Efficient 3D Image Segmentation

Ken C. L. Wong; Hongzhi Wang; Tanveer Syeda-Mahmood

arXiv:2310.04466·eess.IV·October 10, 2023

HartleyMHA: Self-Attention in Frequency Domain for Resolution-Robust and Parameter-Efficient 3D Image Segmentation

Ken C. L. Wong, Hongzhi Wang, Tanveer Syeda-Mahmood

PDF

1 Repo

TL;DR

HartleyMHA introduces a frequency domain self-attention model inspired by Fourier neural operators, achieving resolution robustness and efficiency in 3D image segmentation with significantly fewer parameters.

Contribution

The paper proposes HartleyMHA, a novel frequency domain self-attention mechanism based on FNO and Hartley transform, improving resolution robustness and parameter efficiency in 3D segmentation.

Findings

01

Outperforms other models in resolution robustness on BraTS'19 dataset.

02

Uses less than 1% of parameters compared to comparable models.

03

Achieves efficient high-order feature integration in 3D segmentation.

Abstract

With the introduction of Transformers, different attention-based models have been proposed for image segmentation with promising results. Although self-attention allows capturing of long-range dependencies, it suffers from a quadratic complexity in the image size especially in 3D. To avoid the out-of-memory error during training, input size reduction is usually required for 3D segmentation, but the accuracy can be suboptimal when the trained models are applied on the original image size. To address this limitation, inspired by the Fourier neural operator (FNO), we introduce the HartleyMHA model which is robust to training image resolution with efficient self-attention. FNO is a deep learning framework for learning mappings between functions in partial differential equations, which has the appealing properties of zero-shot super-resolution and global receptive field. We modify the FNO by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ibm/multimodal-3d-image-segmentation
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.