Revisiting Click-based Interactive Video Object Segmentation
Stephane Vujasinovic, Sebastian Bullinger, Stefan Becker, Norbert, Scherer-Negenborn, Michael Arens, Rainer Stiefelhagen

TL;DR
This paper introduces CiVOS, a click-based interactive video object segmentation framework that simplifies user interaction and achieves competitive results with less workload compared to scribble-based methods.
Contribution
Proposes a novel click-based interactive VOS framework that decouples user interaction and mask propagation, reducing user workload and maintaining competitive performance.
Findings
Achieves competitive segmentation results on DAVIS dataset.
Requires less user effort than scribble-based methods.
Adapts evaluation metrics for hardware-independent comparison.
Abstract
While current methods for interactive Video Object Segmentation (iVOS) rely on scribble-based interactions to generate precise object masks, we propose a Click-based interactive Video Object Segmentation (CiVOS) framework to simplify the required user workload as much as possible. CiVOS builds on de-coupled modules reflecting user interaction and mask propagation. The interaction module converts click-based interactions into an object mask, which is then inferred to the remaining frames by the propagation module. Additional user interactions allow for a refinement of the object mask. The approach is extensively evaluated on the popular interactive~DAVIS dataset, but with an inevitable adaptation of scribble-based interactions with click-based counterparts. We consider several strategies for generating clicks during our evaluation to reflect various user inputs and adjust the DAVIS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
