Compute-Efficient Medical Image Classification with Softmax-Free Transformers and Sequence Normalization
Firas Khader, Omar S. M. El Nahhas, Tianyu Han, Gustav, M\"uller-Franzes, Sven Nebelung, Jakob Nikolas Kather, Daniel Truhn

TL;DR
This paper introduces a simple, softmax-free Transformer approach with sequence normalization that reduces computational complexity from quadratic to linear, enabling efficient processing of high-resolution medical images without sacrificing accuracy.
Contribution
It proposes a novel softmax-free attention mechanism with sequence normalization, significantly improving efficiency for long medical image sequences.
Findings
Achieves linear complexity in memory and computation
Maintains comparable accuracy to traditional Transformers
Effective across diverse medical imaging datasets
Abstract
The Transformer model has been pivotal in advancing fields such as natural language processing, speech recognition, and computer vision. However, a critical limitation of this model is its quadratic computational and memory complexity relative to the sequence length, which constrains its application to longer sequences. This is especially crucial in medical imaging where high-resolution images can reach gigapixel scale. Efforts to address this issue have predominantely focused on complex techniques, such as decomposing the softmax operation integral to the Transformer's architecture. This paper addresses this quadratic computational complexity of Transformer models and introduces a remarkably simple and effective method that circumvents this issue by eliminating the softmax function from the attention mechanism and adopting a sequence normalization technique for the key, query, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBrain Tumor Detection and Classification
MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention
