Auxiliary Descriptive Knowledge for Few-Shot Adaptation of Vision-Language Model

SuBeen Lee; GilHan Park; WonJun Moon; Hyun Seok Seong; Jae-Pil Heo

arXiv:2512.17313·cs.CV·December 22, 2025

Auxiliary Descriptive Knowledge for Few-Shot Adaptation of Vision-Language Model

SuBeen Lee, GilHan Park, WonJun Moon, Hyun Seok Seong, Jae-Pil Heo

PDF

Open Access

TL;DR

This paper introduces Auxiliary Descriptive Knowledge (ADK), a framework that enriches text representations with descriptive prompts generated by large language models, improving few-shot adaptation of vision-language models without added inference overhead.

Contribution

The paper proposes ADK, a novel, efficient method to incorporate rich, descriptive prompts into vision-language models, enhancing few-shot adaptation performance without increasing inference costs.

Findings

01

ADK improves performance of multiple PEFT methods across various tasks.

02

ADK achieves state-of-the-art results in few-shot vision-language adaptation.

03

The approach is parameter-free and plug-and-play, facilitating easy integration.

Abstract

Despite the impressive zero-shot capabilities of Vision-Language Models (VLMs), they often struggle in downstream tasks with distribution shifts from the pre-training data. Few-Shot Adaptation (FSA-VLM) has emerged as a key solution, typically using Parameter-Efficient Fine-Tuning (PEFT) to adapt models with minimal data. However, these PEFT methods are constrained by their reliance on fixed, handcrafted prompts, which are often insufficient to understand the semantics of classes. While some studies have proposed leveraging image-induced prompts to provide additional clues for classification, they introduce prohibitive computational overhead at inference. Therefore, we introduce Auxiliary Descriptive Knowledge (ADK), a novel framework that efficiently enriches text representations without compromising efficiency. ADK first leverages a Large Language Model to generate a rich set of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis