MedFormer: Hierarchical Medical Vision Transformer with Content-Aware Dual Sparse Selection Attention
Zunhui Xia, Hongxing Li, Libin Lan

TL;DR
MedFormer is a versatile hierarchical vision transformer designed for medical image recognition, combining a pyramid structure with a novel content-aware dual sparse attention mechanism to improve efficiency and robustness across various tasks.
Contribution
It introduces MedFormer, a generalizable medical vision transformer with a pyramid backbone and a novel dual sparse selection attention mechanism for improved efficiency and performance.
Findings
Outperforms existing models in accuracy and efficiency
Effective across multiple medical imaging tasks
Reduces computational load while maintaining high performance
Abstract
Medical image recognition serves as a key way to aid in clinical diagnosis, enabling more accurate and timely identification of diseases and abnormalities. Vision transformer-based approaches have proven effective in handling various medical recognition tasks. However, these methods encounter two primary challenges. First, they are often task-specific and architecture-tailored, limiting their general applicability. Second, they usually either adopt full attention to model long-range dependencies, resulting in high computational costs, or rely on handcrafted sparse attention, potentially leading to suboptimal performance. To tackle these issues, we present MedFormer, an efficient medical vision transformer with two key ideas. First, it employs a pyramid scaling structure as a versatile backbone for various medical image recognition tasks, including image classification and dense…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
