Exploiting Layer Normalization Fine-tuning in Visual Transformer Foundation Models for Classification

Zhaorui Tan; Tan Pan; Kaizhu Huang; Weimiao Yu; Kai Yao; Chen Jiang; Qiufeng Wang; Anh Nguyen; Xin Guo; Yuan Cheng; Xi Yang

arXiv:2508.07577·cs.CV·August 12, 2025

Exploiting Layer Normalization Fine-tuning in Visual Transformer Foundation Models for Classification

Zhaorui Tan, Tan Pan, Kaizhu Huang, Weimiao Yu, Kai Yao, Chen Jiang, Qiufeng Wang, Anh Nguyen, Xin Guo, Yuan Cheng, Xi Yang

PDF

Open Access

TL;DR

This paper investigates the role of LayerNorm in fine-tuning Vision Transformers for classification, revealing how LayerNorm shifts relate to domain shifts and proposing a rescaling method to improve transfer learning, especially with limited data.

Contribution

It introduces a novel analysis of LayerNorm shifts during fine-tuning, proposes a rescaling mechanism based on the Fine-tuning Shift Ratio, and demonstrates its effectiveness across various datasets and settings.

Findings

01

LayerNorm shifts indicate domain transfer quality.

02

The proposed rescaling improves fine-tuning performance.

03

OOD tasks show lower FSR and higher λ, especially with scarce data.

Abstract

LayerNorm is pivotal in Vision Transformers (ViTs), yet its fine-tuning dynamics under data scarcity and domain shifts remain underexplored. This paper shows that shifts in LayerNorm parameters after fine-tuning (LayerNorm shifts) are indicative of the transitions between source and target domains; its efficacy is contingent upon the degree to which the target training samples accurately represent the target domain, as quantified by our proposed Fine-tuning Shift Ratio ( $F S R$ ). Building on this, we propose a simple yet effective rescaling mechanism using a scalar $λ$ that is negatively correlated to $F S R$ to align learned LayerNorm shifts with those ideal shifts achieved under fully representative data, combined with a cyclic framework that further enhances the LayerNorm fine-tuning. Extensive experiments across natural and pathological images, in both in-distribution (ID) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Face recognition and analysis