YouTube-VOS: Sequence-to-Sequence Video Object Segmentation

Ning Xu; Linjie Yang; Yuchen Fan; Jianchao Yang; Dingcheng Yue; Yuchen; Liang; Brian Price; Scott Cohen; and Thomas Huang

arXiv:1809.00461·cs.CV·September 5, 2018·20 cites

YouTube-VOS: Sequence-to-Sequence Video Object Segmentation

Ning Xu, Linjie Yang, Yuchen Fan, Jianchao Yang, Dingcheng Yue, Yuchen, Liang, Brian Price, Scott Cohen, and Thomas Huang

PDF

Open Access 4 Repos

TL;DR

This paper introduces a large-scale video object segmentation dataset, YouTube-VOS, and proposes a sequence-to-sequence network that effectively captures long-term spatial-temporal features, achieving state-of-the-art results.

Contribution

The paper presents the largest video object segmentation dataset to date and a novel sequence-to-sequence model for improved long-term video segmentation.

Findings

01

Achieved top performance on YouTube-VOS test set

02

Comparable results on DAVIS 2016 with current state-of-the-art

03

Large dataset significantly improves model effectiveness

Abstract

Learning long-term spatial-temporal features are critical for many video analysis tasks. However, existing video segmentation methods predominantly rely on static image segmentation techniques, and methods capturing temporal dependency for segmentation have to depend on pretrained optical flow models, leading to suboptimal solutions for the problem. End-to-end sequential learning to explore spatial-temporal features for video segmentation is largely limited by the scale of available video segmentation datasets, i.e., even the largest video segmentation dataset only contains 90 short video clips. To solve this problem, we build a new large-scale video object segmentation dataset called YouTube Video Object Segmentation dataset (YouTube-VOS). Our dataset contains 3,252 YouTube video clips and 78 categories including common objects and human activities. This is by far the largest video…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection · Video Surveillance and Tracking Methods