TCSAFormer: Efficient Vision Transformer with Token Compression and Sparse Attention for Medical Image Segmentation

Zunhui Xia; Hongxing Li; Libin Lan

arXiv:2508.04058·cs.CV·August 7, 2025

TCSAFormer: Efficient Vision Transformer with Token Compression and Sparse Attention for Medical Image Segmentation

Zunhui Xia, Hongxing Li, Libin Lan

PDF

TL;DR

TCSAFormer is an efficient vision transformer for medical image segmentation that uses token compression and sparse attention to reduce complexity and improve local feature capture, outperforming state-of-the-art methods.

Contribution

It introduces a novel Compressed Attention module and a Dual-Branch Feed-Forward Network to enhance efficiency and local feature modeling in medical image segmentation.

Findings

01

Achieves superior segmentation accuracy on multiple datasets.

02

Reduces computational complexity compared to traditional transformers.

03

Maintains a favorable efficiency-accuracy trade-off.

Abstract

In recent years, transformer-based methods have achieved remarkable progress in medical image segmentation due to their superior ability to capture long-range dependencies. However, these methods typically suffer from two major limitations. First, their computational complexity scales quadratically with the input sequences. Second, the feed-forward network (FFN) modules in vanilla Transformers typically rely on fully connected layers, which limits models' ability to capture local contextual information and multiscale features critical for precise semantic segmentation. To address these issues, we propose an efficient medical image segmentation network, named TCSAFormer. The proposed TCSAFormer adopts two key ideas. First, it incorporates a Compressed Attention (CA) module, which combines token compression and pixel-level sparse attention to dynamically focus on the most relevant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.