Interactive Video Object Segmentation in the Wild
Arnaud Benard, Michael Gygli

TL;DR
This paper introduces a deep interactive video object segmentation system that achieves high accuracy with minimal user input, enabling efficient correction and refinement of video segmentations in challenging sequences.
Contribution
It presents a novel deep interactive segmentation method that requires only a few clicks, improving accuracy and usability for video object segmentation tasks.
Findings
Achieves 90% IOU with 3.8 clicks on average on GrabCut dataset.
Effectively refines initial segmentations to correct failures.
Provides insights into user annotation patterns and correction behaviors.
Abstract
In this paper we present our system for human-in-the-loop video object segmentation. The backbone of our system is a method for one-shot video object segmentation. While fast, this method requires an accurate pixel-level segmentation of one (or several) frames as input. As manually annotating such a segmentation is impractical, we propose a deep interactive image segmentation method, that can accurately segment objects with only a handful of clicks. On the GrabCut dataset, our method obtains 90% IOU with just 3.8 clicks on average, setting the new state of the art. Furthermore, as our method iteratively refines an initial segmentation, it can effectively correct frames where the video object segmentation fails, thus allowing users to quickly obtain high quality results even on challenging sequences. Finally, we investigate usage patterns and give insights in how many steps users take to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
