Interactive Video Object Segmentation in the Wild

Arnaud Benard; Michael Gygli

arXiv:1801.00269·cs.CV·January 3, 2018·34 cites

Interactive Video Object Segmentation in the Wild

Arnaud Benard, Michael Gygli

PDF

Open Access

TL;DR

This paper introduces a deep interactive video object segmentation system that achieves high accuracy with minimal user input, enabling efficient correction and refinement of video segmentations in challenging sequences.

Contribution

It presents a novel deep interactive segmentation method that requires only a few clicks, improving accuracy and usability for video object segmentation tasks.

Findings

01

Achieves 90% IOU with 3.8 clicks on average on GrabCut dataset.

02

Effectively refines initial segmentations to correct failures.

03

Provides insights into user annotation patterns and correction behaviors.

Abstract

In this paper we present our system for human-in-the-loop video object segmentation. The backbone of our system is a method for one-shot video object segmentation. While fast, this method requires an accurate pixel-level segmentation of one (or several) frames as input. As manually annotating such a segmentation is impractical, we propose a deep interactive image segmentation method, that can accurately segment objects with only a handful of clicks. On the GrabCut dataset, our method obtains 90% IOU with just 3.8 clicks on average, setting the new state of the art. Furthermore, as our method iteratively refines an initial segmentation, it can effectively correct frames where the video object segmentation fails, thus allowing users to quickly obtain high quality results even on challenging sequences. Finally, we investigate usage patterns and give insights in how many steps users take to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization