Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation
Weiyao Wang, Matt Feiszli, Heng Wang, Du Tran

TL;DR
This paper introduces UVO, a large and challenging benchmark dataset for open-world, class-agnostic video object segmentation, enabling research beyond traditional closed-world detection methods.
Contribution
The paper presents UVO, a significantly larger and more complex benchmark for open-world video object segmentation, facilitating development of novel class-agnostic segmentation approaches.
Findings
UVO is approximately 8 times larger than DAVIS.
UVO contains 7 times more mask annotations per video than YouTube-VOS and YouTube-VIS.
UVO includes crowded scenes and complex background motions, increasing challenge.
Abstract
Current state-of-the-art object detection and segmentation methods work well under the closed-world assumption. This closed-world setting assumes that the list of object categories is available during training and deployment. However, many real-world applications require detecting or segmenting novel objects, i.e., object categories never seen during training. In this paper, we present, UVO (Unidentified Video Objects), a new benchmark for open-world class-agnostic object segmentation in videos. Besides shifting the problem focus to the open-world setup, UVO is significantly larger, providing approximately 8 times more videos compared with DAVIS, and 7 times more mask (instance) annotations per video compared with YouTube-VOS and YouTube-VIS. UVO is also more challenging as it includes many videos with crowded scenes and complex background motions. We demonstrated that UVO can be used…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
