Towards Open Vocabulary Learning: A Survey

Jianzong Wu; Xiangtai Li; Shilin Xu; Haobo Yuan; Henghui Ding; Yibo; Yang; Xia Li; Jiangning Zhang; Yunhai Tong; Xudong Jiang; Bernard Ghanem,; Dacheng Tao

arXiv:2306.15880·cs.CV·February 2, 2024

Towards Open Vocabulary Learning: A Survey

Jianzong Wu, Xiangtai Li, Shilin Xu, Haobo Yuan, Henghui Ding, Yibo, Yang, Xia Li, Jiangning Zhang, Yunhai Tong, Xudong Jiang, Bernard Ghanem,, Dacheng Tao

PDF

1 Repo 1 Models

TL;DR

This survey comprehensively reviews recent advances in open vocabulary learning for visual scene understanding, highlighting its distinctions from related tasks and analyzing various methods, datasets, and future challenges.

Contribution

It is the first extensive literature review on open vocabulary learning, summarizing key developments, comparisons, and insights in the field.

Findings

01

Open vocabulary learning extends beyond traditional closed-set models.

02

Recent methods leverage vision-language pre-training for recognizing unseen categories.

03

The survey identifies key challenges and future research directions.

Abstract

In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. However, most approaches operate on the close-set assumption, meaning that the model can only identify pre-defined categories that are present in the training set. Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training. These new approaches seek to locate and recognize categories beyond the annotated label space. The open vocabulary approach is more general, practical, and effective compared to weakly supervised and zero-shot settings. This paper provides a thorough review of open vocabulary learning, summarizing and analyzing recent developments in the field. In particular, we begin by comparing it to related concepts such as zero-shot learning, open-set recognition, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jianzongwu/awesome-open-vocabulary
tfOfficial

Models

🤗
linhuixiao/Awesome-Visual-Grounding
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.