Breaking the "Object" in Video Object Segmentation

Pavel Tokmakov; Jie Li; Adrien Gaidon

arXiv:2212.06200·cs.CV·March 29, 2023

Breaking the "Object" in Video Object Segmentation

Pavel Tokmakov, Jie Li, Adrien Gaidon

PDF

Open Access

TL;DR

This paper introduces a new dataset, VOST, for evaluating video object segmentation under transformations, revealing current methods' limitations and proposing improvements to better model spatio-temporal information.

Contribution

The paper presents VOST, a large dataset focusing on object transformations, and analyzes the weaknesses of existing VOS methods, proposing modifications to enhance their robustness.

Findings

01

Existing VOS methods struggle with transformed objects.

02

Current methods overly depend on static appearance cues.

03

Proposed modifications improve spatio-temporal modeling capabilities.

Abstract

The appearance of an object can be fleeting when it transforms. As eggs are broken or paper is torn, their color, shape and texture can change dramatically, preserving virtually nothing of the original except for the identity itself. Yet, this important phenomenon is largely absent from existing video object segmentation (VOS) benchmarks. In this work, we close the gap by collecting a new dataset for Video Object Segmentation under Transformations (VOST). It consists of more than 700 high-resolution videos, captured in diverse environments, which are 21 seconds long on average and densely labeled with instance masks. A careful, multi-step approach is adopted to ensure that these videos focus on complex object transformations, capturing their full temporal extent. We then extensively evaluate state-of-the-art VOS methods and make a number of important discoveries. In particular, we show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques

MethodsVOS