Static Key Attention in Vision

Zizhao Hu; Xiaolin Zhou; Mohammad Rostami

arXiv:2412.07049·cs.CV·December 11, 2024

Static Key Attention in Vision

Zizhao Hu, Xiaolin Zhou, Mohammad Rostami

PDF

Open Access

TL;DR

This paper investigates replacing dynamic keys with static keys in vision transformer attention mechanisms, finding that static keys can perform as well or better, simplifying the process without loss of accuracy.

Contribution

It introduces static key attention as a simpler alternative to dynamic keys in vision transformers, demonstrating comparable or superior performance.

Findings

01

Static key attention matches or exceeds dynamic key performance.

02

Static keys simplify the attention mechanism without performance loss.

03

Integration into Metaformer improves hierarchical architecture performance.

Abstract

The success of vision transformers is widely attributed to the expressive power of their dynamically parameterized multi-head self-attention mechanism. We examine the impact of substituting the dynamic parameterized key with a static key within the standard attention mechanism in Vision Transformers. Our findings reveal that static key attention mechanisms can match or even exceed the performance of standard self-attention. Integrating static key attention modules into a Metaformer backbone, we find that it serves as a better intermediate stage in hierarchical hybrid architectures, balancing the strengths of depth-wise convolution and self-attention. Experiments on several vision tasks underscore the effectiveness of the static key mechanism, indicating that the typical two-step dynamic parameterization in attention can be streamlined to a single step without impacting performance under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInfrared Target Detection Methodologies · Ocular and Laser Science Research

MethodsSoftmax · Attention Is All You Need · MetaFormer · Convolution