TL;DR
This survey comprehensively reviews recent advances in open vocabulary learning for visual scene understanding, highlighting its distinctions from related tasks and analyzing various methods, datasets, and future challenges.
Contribution
It is the first extensive literature review on open vocabulary learning, summarizing key developments, comparisons, and insights in the field.
Findings
Open vocabulary learning extends beyond traditional closed-set models.
Recent methods leverage vision-language pre-training for recognizing unseen categories.
The survey identifies key challenges and future research directions.
Abstract
In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. However, most approaches operate on the close-set assumption, meaning that the model can only identify pre-defined categories that are present in the training set. Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training. These new approaches seek to locate and recognize categories beyond the annotated label space. The open vocabulary approach is more general, practical, and effective compared to weakly supervised and zero-shot settings. This paper provides a thorough review of open vocabulary learning, summarizing and analyzing recent developments in the field. In particular, we begin by comparing it to related concepts such as zero-shot learning, open-set recognition, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
