Token-UNet: A New Case for Transformers Integration in Efficient and Interpretable 3D UNets for Brain Imaging Segmentation
Louis Fabrice Tshimanga, Andrea Zanola, Federico Del Pup, Manfredo Atzori

TL;DR
Token-UNet introduces a hybrid approach combining convolutional layers with token-based transformers to enable efficient, interpretable 3D brain imaging segmentation suitable for limited hardware environments.
Contribution
The paper proposes Token-UNet, integrating TokenLearner modules into UNet architectures to reduce computational costs while maintaining or improving segmentation performance.
Findings
Reduced memory, computation time, and parameters compared to SwinUNETR.
Achieved higher average Dice score (87.21%) than SwinUNETR (86.75%).
Produced interpretable attention maps highlighting task-relevant features.
Abstract
We present Token-UNet, adopting the TokenLearner and TokenFuser modules to encase Transformers into UNets. While Transformers have enabled global interactions among input elements in medical imaging, current computational challenges hinder their deployment on common hardware. Models like (Swin)UNETR adapt the UNet architecture by incorporating (Swin)Transformer encoders, which process tokens that each represent small subvolumes ( voxels) of the input. The Transformer attention mechanism scales quadratically with the number of tokens, which is tied to the cubic scaling of 3D input resolution. This work reconsiders the role of convolution and attention, introducing Token-UNets, a family of 3D segmentation models that can operate in constrained computational environments and time frames. To mitigate computational demands, our approach maintains the convolutional encoder of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Medical Image Segmentation Techniques · Domain Adaptation and Few-Shot Learning
