FVG-PT: Adaptive Foreground View-Guided Prompt Tuning for Vision-Language Models

Haoyang Li; Liang Wang; Siyu Zhou; Jiacheng Sun; Jing Jiang; Chao Wang; Guodong Long; Yan Peng

arXiv:2603.08708·cs.CV·March 10, 2026

FVG-PT: Adaptive Foreground View-Guided Prompt Tuning for Vision-Language Models

Haoyang Li, Liang Wang, Siyu Zhou, Jiacheng Sun, Jing Jiang, Chao Wang, Guodong Long, Yan Peng

PDF

Open Access 1 Models 1 Datasets

TL;DR

This paper introduces FVG-PT, an adaptive prompt tuning method for vision-language models that guides visual attention towards foreground objects, improving task adaptation and addressing attention shift issues.

Contribution

FVG-PT proposes a novel foreground attention guidance module with a reliability gate, distillation, and calibration to enhance prompt tuning in VLMs.

Findings

01

FVG-PT improves foreground attention alignment across models.

02

Enhanced tuning results on multiple datasets.

03

Demonstrates compatibility with various backbone models.

Abstract

CLIP-based prompt tuning enables pretrained Vision-Language Models (VLMs) to efficiently adapt to downstream tasks. Although existing studies have made significant progress, they pay limited attention to changes in the internal attention representations of VLMs during the tuning process. In this paper, we attribute the failure modes of prompt tuning predictions to shifts in foreground attention of the visual encoder, and propose Foreground View-Guided Prompt Tuning (FVG-PT), an adaptive plug-and-play foreground attention guidance module, to alleviate the shifts. Concretely, FVG-PT introduces a learnable Foreground Reliability Gate to automatically enhance the foreground view quality, applies a Foreground Distillation Compensation module to guide visual attention toward the foreground, and further introduces a Prior Calibration module to mitigate generalization degradation caused by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
JREion/FVG-PT
model

Datasets

JREion/Prompt_Tuning_Datasets_with_Foreground
dataset· 163 dl
163 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications