Generalizing vision-language models to novel domains: A comprehensive survey

Xinyao Li; Jingjing Li; Fengling Li; Lei Zhu; Yang Yang; Heng Tao Shen

arXiv:2506.18504·cs.CV·July 1, 2025

Generalizing vision-language models to novel domains: A comprehensive survey

Xinyao Li, Jingjing Li, Fengling Li, Lei Zhu, Yang Yang, Heng Tao Shen

PDF

Open Access

TL;DR

This survey reviews recent advances in vision-language models, focusing on their generalization to new domains, categorizing methods, benchmarking results, and relation to multimodal large language models, to guide future research.

Contribution

It provides a comprehensive categorization and analysis of VLM generalization methods, benchmarks, and their relation to multimodal large language models, offering a clear research landscape.

Findings

01

Categorizes VLM generalization methods into prompt-, parameter-, and feature-based.

02

Summarizes key benchmarks and compares method performances.

03

Discusses the relationship between VLMs and multimodal large language models.

Abstract

Recently, vision-language pretraining has emerged as a transformative technique that integrates the strengths of both visual and textual modalities, resulting in powerful vision-language models (VLMs). Leveraging web-scale pretraining data, these models exhibit strong zero-shot capabilities. However, their performance often deteriorates when confronted with domain-specific or specialized generalization tasks. To address this, a growing body of research focuses on transferring or generalizing the rich knowledge embedded in VLMs to various downstream applications. This survey aims to comprehensively summarize the generalization settings, methodologies, benchmarking and results in VLM literatures. Delving into the typical VLM structures, current literatures are categorized into prompt-based, parameter-based and feature-based methods according to the transferred modules. The differences and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks