Bi-Orthogonal Factor Decomposition for Vision Transformers
Fenil R. Doshi, Thomas Fel, Talia Konkle, George Alvarez

TL;DR
This paper introduces Bi-orthogonal Factor Decomposition (BFD), a new analytical framework that disentangles positional and content information in Vision Transformers, revealing how attention mechanisms mediate token interactions.
Contribution
BFD provides a novel two-stage method to analyze and interpret the informational factors in attention mechanisms of Vision Transformers, offering new insights into their operation.
Findings
Attention operates primarily through content interactions.
Heads specialize into different interaction modes.
DINOv2 emphasizes holistic shape processing through positional and semantic integration.
Abstract
Self-attention is the central computational primitive of Vision Transformers, yet we lack a principled understanding of what information attention mechanisms exchange between tokens. Attention maps describe where weight mass concentrates; they do not reveal whether queries and keys trade position, content, or both. We introduce Bi-orthogonal Factor Decomposition (BFD), a two-stage analytical framework: first, an ANOVA-based decomposition statistically disentangles token activations into orthogonal positional and content factors; second, SVD of the query-key interaction matrix QK^T exposes bi-orthogonal modes that reveal how these factors mediate communication. After validating proper isolation of position and content, we apply BFD to state-of-the-art vision models and uncover three phenomena.(i) Attention operates primarily through content. Content-content interactions dominate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace Recognition and Perception · Visual Attention and Saliency Detection · Ferroelectric and Negative Capacitance Devices
