Generalizing vision-language models to novel domains: A comprehensive survey
Xinyao Li, Jingjing Li, Fengling Li, Lei Zhu, Yang Yang, Heng Tao Shen

TL;DR
This survey reviews recent advances in vision-language models, focusing on their generalization to new domains, categorizing methods, benchmarking results, and relation to multimodal large language models, to guide future research.
Contribution
It provides a comprehensive categorization and analysis of VLM generalization methods, benchmarks, and their relation to multimodal large language models, offering a clear research landscape.
Findings
Categorizes VLM generalization methods into prompt-, parameter-, and feature-based.
Summarizes key benchmarks and compares method performances.
Discusses the relationship between VLMs and multimodal large language models.
Abstract
Recently, vision-language pretraining has emerged as a transformative technique that integrates the strengths of both visual and textual modalities, resulting in powerful vision-language models (VLMs). Leveraging web-scale pretraining data, these models exhibit strong zero-shot capabilities. However, their performance often deteriorates when confronted with domain-specific or specialized generalization tasks. To address this, a growing body of research focuses on transferring or generalizing the rich knowledge embedded in VLMs to various downstream applications. This survey aims to comprehensively summarize the generalization settings, methodologies, benchmarking and results in VLM literatures. Delving into the typical VLM structures, current literatures are categorized into prompt-based, parameter-based and feature-based methods according to the transferred modules. The differences and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks
