ResiDual Transformer Alignment with Spectral Decomposition
Lorenzo Basile, Valentino Maiorca, Luca Bortolussi, Emanuele Rodol\`a,, Francesco Locatello

TL;DR
This paper investigates the spectral geometry of residuals in vision transformers, revealing how head specialization impacts zero-shot classification, and introduces ResiDual, a spectral alignment method that improves modality alignment and task performance.
Contribution
It uncovers the low-dimensional structure of residuals and proposes ResiDual, a spectral alignment technique for enhancing modality alignment and zero-shot classification in vision transformers.
Findings
Residual head representations encode specialized roles.
Improved text-head alignment boosts zero-shot classification.
ResiDual achieves fine-tuning level performance efficiently.
Abstract
When examined through the lens of their residual streams, a puzzling property emerges in transformer networks: residual contributions (e.g., attention heads) sometimes specialize in specific tasks or input attributes. In this paper, we analyze this phenomenon in vision transformers, focusing on the spectral geometry of residuals, and explore its implications for modality alignment in vision-language models. First, we link it to the intrinsically low-dimensional structure of visual head representations, zooming into their principal components and showing that they encode specialized roles across a wide variety of input data distributions. Then, we analyze the effect of head specialization in multimodal models, focusing on how improved alignment between text and specialized heads impacts zero-shot classification performance. This specialization-performance link consistently holds across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Advancements in Photolithography Techniques · Welding Techniques and Residual Stresses
MethodsSoftmax · Attention Is All You Need
