Towards the Unification of Generative and Discriminative Visual Foundation Model: A Survey
Xu Liu, Tong Zhou, Yuanxin Wang, Yuping Wang, Qinjingwen Cao, Weizhi, Du, Yonghuan Yang, Junjun He, Yu Qiao, Yiqing Shen

TL;DR
This survey reviews the development of visual foundation models, highlighting their capabilities in generative and discriminative tasks, and discusses the potential for unifying these paradigms to advance computer vision.
Contribution
It provides a comprehensive overview of VFMs, their key breakthroughs, resources, challenges, and emphasizes the emerging integration of generative and discriminative approaches.
Findings
VFMs exhibit strong zero-shot generalization capabilities.
Recent advances enable VFMs to perform both generative and discriminative tasks.
The field is moving towards unifying generative and discriminative paradigms.
Abstract
The advent of foundation models, which are pre-trained on vast datasets, has ushered in a new era of computer vision, characterized by their robustness and remarkable zero-shot generalization capabilities. Mirroring the transformative impact of foundation models like large language models (LLMs) in natural language processing, visual foundation models (VFMs) have become a catalyst for groundbreaking developments in computer vision. This review paper delineates the pivotal trajectories of VFMs, emphasizing their scalability and proficiency in generative tasks such as text-to-image synthesis, as well as their adeptness in discriminative tasks including image segmentation. While generative and discriminative models have historically charted distinct paths, we undertake a comprehensive examination of the recent strides made by VFMs in both domains, elucidating their origins, seminal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
