Vision Transformer Finetuning Benefits from Non-Smooth Components

Ambroise Odonnat; Laetitia Chapel; Romain Tavenard; Ievgen Redko

arXiv:2602.06883·cs.LG·February 10, 2026

Vision Transformer Finetuning Benefits from Non-Smooth Components

Ambroise Odonnat, Laetitia Chapel, Romain Tavenard, Ievgen Redko

PDF

Open Access

TL;DR

This paper investigates how the plasticity of vision transformer components, particularly attention and feedforward layers, influences transfer learning performance, challenging the assumption that smoothness is always beneficial.

Contribution

It introduces a theoretical and experimental framework to analyze component plasticity in vision transformers, guiding better finetuning strategies.

Findings

01

High plasticity in attention modules improves finetuning results.

02

Feedforward layers with higher plasticity also enhance transfer learning.

03

Challenging the belief that smoothness always benefits transformer adaptation.

Abstract

The smoothness of the transformer architecture has been extensively studied in the context of generalization, training stability, and adversarial robustness. However, its role in transfer learning remains poorly understood. In this paper, we analyze the ability of vision transformer components to adapt their outputs to changes in inputs, or, in other words, their plasticity. Defined as an average rate of change, it captures the sensitivity to input perturbation; in particular, a high plasticity implies low smoothness. We demonstrate through theoretical analysis and comprehensive experiments that this perspective provides principled guidance in choosing the components to prioritize during adaptation. A key takeaway for practitioners is that the high plasticity of the attention modules and feedforward layers consistently leads to better finetuning performance. Our findings depart from the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning