Beyond the final layer: Attentive multilayer fusion for vision transformers

Laure Ciernik; Marco Morik; Lukas Thede; Luca Eyring; Shinichi Nakajima; Zeynep Akata; Lukas Muttenthaler

arXiv:2601.09322·cs.CV·January 15, 2026

Beyond the final layer: Attentive multilayer fusion for vision transformers

Laure Ciernik, Marco Morik, Lukas Thede, Luca Eyring, Shinichi Nakajima, Zeynep Akata, Lukas Muttenthaler

PDF

Open Access

TL;DR

This paper introduces an attentive multilayer fusion method for vision transformers that dynamically combines information from all layers, significantly improving task adaptation over traditional last-layer probing.

Contribution

It proposes a novel attentive probing mechanism that leverages representations from all layers of a Vision Transformer for better downstream task performance.

Findings

01

Consistent performance gains across 20 datasets.

02

Intermediate layers are most beneficial for tasks different from pre-training.

03

Attention heatmaps show task relevance varies across layers.

Abstract

With the rise of large-scale foundation models, efficiently adapting them to downstream tasks remains a central challenge. Linear probing, which freezes the backbone and trains a lightweight head, is computationally efficient but often restricted to last-layer representations. We show that task-relevant information is distributed across the network hierarchy rather than solely encoded in any of the last layers. To leverage this distribution of information, we apply an attentive probing mechanism that dynamically fuses representations from all layers of a Vision Transformer. This mechanism learns to identify the most relevant layers for a target task and combines low-level structural cues with high-level semantic abstractions. Across 20 diverse datasets and multiple pretrained foundation models, our method achieves consistent, substantial gains over standard linear probes. Attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning