EvoVLMA: Evolutionary Vision-Language Model Adaptation

Kun Ding; Ying Wang; Shiming Xiang

arXiv:2508.01558·cs.CV·August 5, 2025

EvoVLMA: Evolutionary Vision-Language Model Adaptation

Kun Ding, Ying Wang, Shiming Xiang

PDF

Open Access

TL;DR

EvoVLMA introduces an automated, evolutionary approach to optimize training-free adaptation algorithms for vision-language models, significantly improving performance in few-shot image classification tasks.

Contribution

The paper presents a novel LLM-assisted evolutionary algorithm that automatically searches for effective VLM adaptation methods, reducing reliance on human expertise and manual design.

Findings

01

Improved 8-shot image classification accuracy by 1.91 points with APE algorithm.

02

Demonstrated effectiveness of automated adaptation algorithms over manual ones.

03

Proposed a scalable, efficient search system for model adaptation algorithms.

Abstract

Pre-trained Vision-Language Models (VLMs) have been exploited in various Computer Vision tasks (e.g., few-shot recognition) via model adaptation, such as prompt tuning and adapters. However, existing adaptation methods are designed by human experts, requiring significant time cost and experience. Inspired by recent advances in Large Language Models (LLMs) based code generation, we propose an Evolutionary Vision-Language Model Adaptation (EvoVLMA) method to automatically search training-free efficient adaptation algorithms for VLMs. We recognize feature selection and logits computation as the key functions in training-free VLM adaptation, and propose a two-stage LLM-assisted evolutionary algorithm for optimizing these parts in a sequential manner, effectively addressing the challenge posed by the expansive search space through a divide-and-conquer strategy. Besides, to enhance the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications