Token-UNet: A New Case for Transformers Integration in Efficient and Interpretable 3D UNets for Brain Imaging Segmentation

Louis Fabrice Tshimanga; Andrea Zanola; Federico Del Pup; Manfredo Atzori

arXiv:2602.20008·cs.CV·February 24, 2026

Token-UNet: A New Case for Transformers Integration in Efficient and Interpretable 3D UNets for Brain Imaging Segmentation

Louis Fabrice Tshimanga, Andrea Zanola, Federico Del Pup, Manfredo Atzori

PDF

Open Access

TL;DR

Token-UNet introduces a hybrid approach combining convolutional layers with token-based transformers to enable efficient, interpretable 3D brain imaging segmentation suitable for limited hardware environments.

Contribution

The paper proposes Token-UNet, integrating TokenLearner modules into UNet architectures to reduce computational costs while maintaining or improving segmentation performance.

Findings

01

Reduced memory, computation time, and parameters compared to SwinUNETR.

02

Achieved higher average Dice score (87.21%) than SwinUNETR (86.75%).

03

Produced interpretable attention maps highlighting task-relevant features.

Abstract

We present Token-UNet, adopting the TokenLearner and TokenFuser modules to encase Transformers into UNets. While Transformers have enabled global interactions among input elements in medical imaging, current computational challenges hinder their deployment on common hardware. Models like (Swin)UNETR adapt the UNet architecture by incorporating (Swin)Transformer encoders, which process tokens that each represent small subvolumes ( $8^{3}$ voxels) of the input. The Transformer attention mechanism scales quadratically with the number of tokens, which is tied to the cubic scaling of 3D input resolution. This work reconsiders the role of convolution and attention, introducing Token-UNets, a family of 3D segmentation models that can operate in constrained computational environments and time frames. To mitigate computational demands, our approach maintains the convolutional encoder of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Medical Image Segmentation Techniques · Domain Adaptation and Few-Shot Learning