Breaking the "Object" in Video Object Segmentation
Pavel Tokmakov, Jie Li, Adrien Gaidon

TL;DR
This paper introduces a new dataset, VOST, for evaluating video object segmentation under transformations, revealing current methods' limitations and proposing improvements to better model spatio-temporal information.
Contribution
The paper presents VOST, a large dataset focusing on object transformations, and analyzes the weaknesses of existing VOS methods, proposing modifications to enhance their robustness.
Findings
Existing VOS methods struggle with transformed objects.
Current methods overly depend on static appearance cues.
Proposed modifications improve spatio-temporal modeling capabilities.
Abstract
The appearance of an object can be fleeting when it transforms. As eggs are broken or paper is torn, their color, shape and texture can change dramatically, preserving virtually nothing of the original except for the identity itself. Yet, this important phenomenon is largely absent from existing video object segmentation (VOS) benchmarks. In this work, we close the gap by collecting a new dataset for Video Object Segmentation under Transformations (VOST). It consists of more than 700 high-resolution videos, captured in diverse environments, which are 21 seconds long on average and densely labeled with instance masks. A careful, multi-step approach is adopted to ensure that these videos focus on complex object transformations, capturing their full temporal extent. We then extensively evaluate state-of-the-art VOS methods and make a number of important discoveries. In particular, we show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
MethodsVOS
