Static Key Attention in Vision
Zizhao Hu, Xiaolin Zhou, Mohammad Rostami

TL;DR
This paper investigates replacing dynamic keys with static keys in vision transformer attention mechanisms, finding that static keys can perform as well or better, simplifying the process without loss of accuracy.
Contribution
It introduces static key attention as a simpler alternative to dynamic keys in vision transformers, demonstrating comparable or superior performance.
Findings
Static key attention matches or exceeds dynamic key performance.
Static keys simplify the attention mechanism without performance loss.
Integration into Metaformer improves hierarchical architecture performance.
Abstract
The success of vision transformers is widely attributed to the expressive power of their dynamically parameterized multi-head self-attention mechanism. We examine the impact of substituting the dynamic parameterized key with a static key within the standard attention mechanism in Vision Transformers. Our findings reveal that static key attention mechanisms can match or even exceed the performance of standard self-attention. Integrating static key attention modules into a Metaformer backbone, we find that it serves as a better intermediate stage in hierarchical hybrid architectures, balancing the strengths of depth-wise convolution and self-attention. Experiments on several vision tasks underscore the effectiveness of the static key mechanism, indicating that the typical two-step dynamic parameterization in attention can be streamlined to a single step without impacting performance under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInfrared Target Detection Methodologies · Ocular and Laser Science Research
MethodsSoftmax · Attention Is All You Need · MetaFormer · Convolution
