Raw Data Matters: Enhancing Prompt Tuning by Internal Augmentation on Vision-Language Models

Haoyang Li; Liang Wang; Chao Wang; Siyu Zhou; Jing Jiang; Yan Peng; Guodong Long

arXiv:2508.02671·cs.CV·November 13, 2025

Raw Data Matters: Enhancing Prompt Tuning by Internal Augmentation on Vision-Language Models

Haoyang Li, Liang Wang, Chao Wang, Siyu Zhou, Jing Jiang, Yan Peng, Guodong Long

PDF

Open Access 1 Models

TL;DR

This paper introduces AugPT, a novel self-contained prompt tuning method for vision-language models that uses internal data augmentation and a gating mechanism to improve performance without external knowledge.

Contribution

AugPT is a new distillation-based prompt tuning approach that leverages internal augmentation and a consensus gating mechanism, avoiding reliance on external data sources.

Findings

01

AugPT improves model performance and generalization.

02

It effectively filters noisy samples during training.

03

AugPT outperforms existing prompt tuning methods.

Abstract

For CLIP-based prompt tuning, introducing more data as additional knowledge for enhancing fine-tuning process is proved to be an effective approach. Existing data amplification strategies for prompt tuning typically rely on external knowledge (e.g., large language models or pre-structured knowledge bases), resulting in higher costs for data collection and processing, while generally ignoring further utilization of features in image modality. To address this, we propose Augmentation-driven Prompt Tuning (AugPT), a self-contained distillation-based prompt tuning approach using only internal augmentation on raw dataset to better exploit known features. Specifically, AugPT employs self-supervised augmentation on unlabeled images in the training set, and introduces a novel gating mechanism based on consensus test, reusing the pre-trained prompt tuning backbone model to spontaneously filter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
JREion/FVG-PT
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis