Layer by layer, module by module: Choose both for optimal OOD probing of ViT
Ambroise Odonnat, Vasilii Feofanov, Laetitia Chapel, Romain Tavenard, Ievgen Redko

TL;DR
This study investigates the behavior of intermediate layers in pretrained vision transformers, revealing that the optimal probing layer depends on the distribution shift between pretraining and downstream data, with feedforward activations being best under large shifts.
Contribution
The paper provides a comprehensive analysis of intermediate layer representations in vision transformers, highlighting the importance of choosing the right module for probing based on distribution shift conditions.
Findings
Feedforward network activations are best for probing under significant distribution shift.
Normalized self-attention outputs are optimal when distribution shift is weak.
Distribution shift causes performance degradation in deeper layers during probing.
Abstract
Recent studies have observed that intermediate layers of foundation models often yield more discriminative representations than the final layer. While initially attributed to autoregressive pretraining, this phenomenon has also been identified in models trained via supervised and discriminative self-supervised objectives. In this paper, we conduct a comprehensive study to analyze the behavior of intermediate layers in pretrained vision transformers. Through extensive linear probing experiments across a diverse set of image classification benchmarks, we find that distribution shift between pretraining and downstream data is the primary cause of performance degradation in deeper layers. Furthermore, we perform a fine-grained analysis at the module level. Our findings reveal that standard probing of transformer block outputs is suboptimal; instead, probing the activation within the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTransition Metal Oxide Nanomaterials · Thin-Film Transistor Technologies · Advanced Memory and Neural Computing
