GLAD: Generalizable Tuning for Vision-Language Models

Yuqi Peng; Pengfei Wang; Jianzhuang Liu; Shifeng Chen

arXiv:2507.13089·cs.CV·July 18, 2025

GLAD: Generalizable Tuning for Vision-Language Models

Yuqi Peng, Pengfei Wang, Jianzhuang Liu, Shifeng Chen

PDF

Open Access

TL;DR

GLAD introduces a simple, effective, and generalizable tuning framework for vision-language models that improves robustness and transferability across various tasks and datasets by combining LoRA with gradient regularization.

Contribution

The paper proposes GLAD, a novel tuning method combining LoRA with gradient regularization, enhancing generalization and robustness in vision-language models.

Findings

01

GLAD outperforms previous methods on 15 benchmark datasets.

02

It improves base-to-novel class generalization.

03

It enhances image domain and cross-dataset generalization.

Abstract

Pre-trained vision-language models, such as CLIP, show impressive zero-shot recognition ability and can be easily transferred to specific downstream tasks via prompt tuning, even with limited training data. However, existing prompt tuning methods face two main challenges: (1) In few-shot scenarios, data scarcity often leads to overfitting, making the model sensitive to changes in the input domain. (2) To mitigate overfitting, these methods typically rely on complex task-specific model architectures and sensitive hyperparameter tuning, severely restricting their general applicability. To address these issues, we propose a simpler and more general framework called GLAD (Generalizable LoRA tuning with RegulArized GraDient). We show that merely applying LoRA achieves performance in downstream tasks comparable to current state-of-the-art prompt-based methods. While LoRA is effective and easy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques