Towards the Unification of Generative and Discriminative Visual   Foundation Model: A Survey

Xu Liu; Tong Zhou; Yuanxin Wang; Yuping Wang; Qinjingwen Cao; Weizhi; Du; Yonghuan Yang; Junjun He; Yu Qiao; Yiqing Shen

arXiv:2312.10163·cs.CV·December 19, 2023·2 cites

Towards the Unification of Generative and Discriminative Visual Foundation Model: A Survey

Xu Liu, Tong Zhou, Yuanxin Wang, Yuping Wang, Qinjingwen Cao, Weizhi, Du, Yonghuan Yang, Junjun He, Yu Qiao, Yiqing Shen

PDF

Open Access

TL;DR

This survey reviews the development of visual foundation models, highlighting their capabilities in generative and discriminative tasks, and discusses the potential for unifying these paradigms to advance computer vision.

Contribution

It provides a comprehensive overview of VFMs, their key breakthroughs, resources, challenges, and emphasizes the emerging integration of generative and discriminative approaches.

Findings

01

VFMs exhibit strong zero-shot generalization capabilities.

02

Recent advances enable VFMs to perform both generative and discriminative tasks.

03

The field is moving towards unifying generative and discriminative paradigms.

Abstract

The advent of foundation models, which are pre-trained on vast datasets, has ushered in a new era of computer vision, characterized by their robustness and remarkable zero-shot generalization capabilities. Mirroring the transformative impact of foundation models like large language models (LLMs) in natural language processing, visual foundation models (VFMs) have become a catalyst for groundbreaking developments in computer vision. This review paper delineates the pivotal trajectories of VFMs, emphasizing their scalability and proficiency in generative tasks such as text-to-image synthesis, as well as their adeptness in discriminative tasks including image segmentation. While generative and discriminative models have historically charted distinct paths, we undertake a comprehensive examination of the recent strides made by VFMs in both domains, elucidating their origins, seminal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning