Video Object Segmentation Without Temporal Information
Kevis-Kokitsi Maninis, Sergi Caelles, Yuhua Chen, Jordi Pont-Tuset,, Laura Leal-Taix\'e, Daniel Cremers, Luc Van Gool

TL;DR
This paper introduces OSVOS-S, a novel approach for semi-supervised video object segmentation that processes frames independently without relying on temporal information, achieving state-of-the-art accuracy and speed.
Contribution
The paper presents OSVOS-S, a fully-convolutional neural network that transfers semantic knowledge from ImageNet to single-object segmentation without temporal cues, improving accuracy over previous methods.
Findings
OSVOS-S is the fastest method on tested datasets.
OSVOS-S achieves the highest accuracy among compared methods.
Instance-level semantic information significantly enhances segmentation results.
Abstract
Video Object Segmentation, and video processing in general, has been historically dominated by methods that rely on the temporal consistency and redundancy in consecutive video frames. When the temporal smoothness is suddenly broken, such as when an object is occluded, or some frames are missing in a sequence, the result of these methods can deteriorate significantly or they may not even produce any result at all. This paper explores the orthogonal approach of processing each frame independently, i.e disregarding the temporal information. In particular, it tackles the task of semi-supervised video object segmentation: the separation of an object from the background in a video, given its mask in the first frame. We present Semantic One-Shot Video Object Segmentation (OSVOS-S), based on a fully-convolutional neural network architecture that is able to successively transfer generic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
