Matched-Learning-Rate Analysis of Attention Drift and Transfer Retention in Fine-Tuned CLIP

Ruize Xia

arXiv:2604.16410·cs.LG·April 21, 2026

Matched-Learning-Rate Analysis of Attention Drift and Transfer Retention in Fine-Tuned CLIP

Ruize Xia

PDF

TL;DR

This study compares Full Fine-Tuning and LoRA for CLIP adaptation under matched learning rates, revealing how learning rate influences attention drift and transfer retention, with LoRA generally preserving more zero-shot transfer.

Contribution

It provides a controlled analysis of how adaptation method and learning rate jointly affect CLIP's attention drift and transfer retention, clarifying prior confounded comparisons.

Findings

01

LoRA preserves more zero-shot transfer than Full FT at matched learning rates.

02

Learning rate modulates attention drift and structural changes in CLIP adaptation.

03

Matched-learning-rate evaluation alters the interpretation of Fine-Tuning versus LoRA.

Abstract

CLIP adaptation can improve in-domain accuracy while degrading out-of-domain transfer, but comparisons between Full Fine-Tuning (Full FT) and LoRA are often confounded by different learning-rate conventions. We study how adaptation method and optimization scale jointly shape attention drift and transfer retention in CLIP using a controlled matched-learning-rate comparison of Full FT and LoRA. The completed matrix contains 80 runs on CLIP ViT-B/32 across EuroSAT and Oxford-IIIT Pets, spanning four shared learning rates ( $1 0^{- 6}$ , $5 \times 1 0^{- 6}$ , $1 0^{- 5}$ , $5 \times 1 0^{- 5}$ ) and five seeds, and evaluates attention-drift metrics, best validation accuracy, and adapter-aware CIFAR-100 zero-shot accuracy. Learning rate strongly modulates structural change: on EuroSAT, Full FT moves from mild entropy broadening at $1 0^{- 6}$ to marked contraction at $5 \times 1 0^{- 5}$ , whereas LoRA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.