TCSAFormer: Efficient Vision Transformer with Token Compression and Sparse Attention for Medical Image Segmentation
Zunhui Xia, Hongxing Li, Libin Lan

TL;DR
TCSAFormer is an efficient vision transformer for medical image segmentation that uses token compression and sparse attention to reduce complexity and improve local feature capture, outperforming state-of-the-art methods.
Contribution
It introduces a novel Compressed Attention module and a Dual-Branch Feed-Forward Network to enhance efficiency and local feature modeling in medical image segmentation.
Findings
Achieves superior segmentation accuracy on multiple datasets.
Reduces computational complexity compared to traditional transformers.
Maintains a favorable efficiency-accuracy trade-off.
Abstract
In recent years, transformer-based methods have achieved remarkable progress in medical image segmentation due to their superior ability to capture long-range dependencies. However, these methods typically suffer from two major limitations. First, their computational complexity scales quadratically with the input sequences. Second, the feed-forward network (FFN) modules in vanilla Transformers typically rely on fully connected layers, which limits models' ability to capture local contextual information and multiscale features critical for precise semantic segmentation. To address these issues, we propose an efficient medical image segmentation network, named TCSAFormer. The proposed TCSAFormer adopts two key ideas. First, it incorporates a Compressed Attention (CA) module, which combines token compression and pixel-level sparse attention to dynamically focus on the most relevant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
