Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision
Gerasimos Chatzoudis, Konstantinos D. Polyzos, Zhuowei Li, Difei Gu, Gemma E. Moran, Hao Wang, Dimitris N. Metaxas

TL;DR
This paper introduces Cross-Layer Transcoders (CLTs) as interpretable, sparse proxy models for Vision Transformers, enabling layer-resolved understanding of representations while maintaining high fidelity and classification performance.
Contribution
The paper proposes CLTs as a novel, depth-aware proxy for ViT activations, providing faithful interpretability and improved understanding of layer contributions.
Findings
CLTs achieve high reconstruction fidelity of ViT activations.
CLTs preserve and sometimes improve CLIP zero-shot classification accuracy.
Cross-layer contribution scores reveal dominant layers critical for performance.
Abstract
Understanding the internal activations of Vision Transformers (ViTs) is critical for building interpretable and trustworthy models. While Sparse Autoencoders (SAEs) have been used to extract human-interpretable features, they operate on individual layers and fail to capture the cross-layer computational structure of Transformers, as well as the relative significance of each layer in forming the last-layer representation. Alternatively, we introduce the adoption of Cross-Layer Transcoders (CLTs) as reliable, sparse, and depth-aware proxy models for MLP blocks in ViTs. CLTs use an encoder-decoder scheme to reconstruct each post-MLP activation from learned sparse embeddings of preceding layers, yielding a linear decomposition that transforms the final representation of ViTs from an opaque embedding into an additive, layer-resolved construction that enables faithful attribution and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
