Adapting Vision-Language Models Without Labels: A Comprehensive Survey
Hao Dong, Lijun Sheng, Jian Liang, Ran He, Eleni Chatzi, Olga Fink

TL;DR
This survey comprehensively reviews unsupervised adaptation methods for vision-language models, categorizing approaches based on data availability and discussing core methodologies, benchmarks, challenges, and future directions.
Contribution
It provides a unified taxonomy and systematic analysis of unsupervised VLM adaptation approaches, filling a gap in existing literature.
Findings
Categorizes approaches into four key paradigms.
Analyzes core methodologies and adaptation strategies.
Highlights open challenges and future research directions.
Abstract
Vision-Language Models (VLMs) have demonstrated remarkable generalization capabilities across a wide range of tasks. However, their performance often remains suboptimal when directly applied to specific downstream scenarios without task-specific adaptation. To enhance their utility while preserving data efficiency, recent research has increasingly focused on unsupervised adaptation methods that do not rely on labeled data. Despite the growing interest in this area, there remains a lack of a unified, task-oriented survey dedicated to unsupervised VLM adaptation. To bridge this gap, we present a comprehensive and structured overview of the field. We propose a taxonomy based on the availability and nature of unlabeled visual data, categorizing existing approaches into four key paradigms: Data-Free Transfer (no data), Unsupervised Domain Transfer (abundant data), Episodic Test-Time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
