HNOSeg-XS: Extremely Small Hartley Neural Operator for Efficient and Resolution-Robust 3D Image Segmentation

Ken C. L. Wong; Hongzhi Wang; and Tanveer Syeda-Mahmood

arXiv:2507.08205·cs.CV·July 14, 2025

HNOSeg-XS: Extremely Small Hartley Neural Operator for Efficient and Resolution-Robust 3D Image Segmentation

Ken C. L. Wong, Hongzhi Wang, and Tanveer Syeda-Mahmood

PDF

1 Repo

TL;DR

HNOSeg-XS introduces a resolution-robust, efficient neural operator for 3D medical image segmentation, outperforming CNNs and transformers in speed, memory, and parameter efficiency across multiple datasets.

Contribution

The paper presents HNOSeg-XS, a novel neural operator model that achieves resolution robustness and efficiency by reformulating segmentation in the frequency domain using Hartley transforms.

Findings

01

Outperforms CNNs and transformers in inference speed and memory usage.

02

Uses fewer than 35,000 parameters, demonstrating high efficiency.

03

Achieves superior resolution robustness across multiple datasets.

Abstract

In medical image segmentation, convolutional neural networks (CNNs) and transformers are dominant. For CNNs, given the local receptive fields of convolutional layers, long-range spatial correlations are captured through consecutive convolutions and pooling. However, as the computational cost and memory footprint can be prohibitively large, 3D models can only afford fewer layers than 2D models with reduced receptive fields and abstract levels. For transformers, although long-range correlations can be captured by multi-head attention, its quadratic complexity with respect to input size is computationally demanding. Therefore, either model may require input size reduction to allow more filters and layers for better segmentation. Nevertheless, given their discrete nature, models trained with patch-wise training or image downsampling may produce suboptimal results when applied on higher…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ibm/multimodal-3d-image-segmentation
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.