VideoClick: Video Object Segmentation with a Single Click
Namdar Homayounfar, Justin Liang, Wei-Chiu Ma, Raquel Urtasun

TL;DR
VideoClick introduces a novel single-click method for efficient video object segmentation, significantly reducing annotation effort and enabling scalable video analysis with competitive accuracy.
Contribution
The paper presents a bottom-up approach using a correlation volume and recurrent attention to achieve accurate video object segmentation from just one click per object.
Findings
Outperforms baseline methods on CityscapesVideo dataset
Reduces annotation time significantly
Effective in segmenting multiple objects in videos
Abstract
Annotating videos with object segmentation masks typically involves a two stage procedure of drawing polygons per object instance for all the frames and then linking them through time. While simple, this is a very tedious, time consuming and expensive process, making the creation of accurate annotations at scale only possible for well-funded labs. What if we were able to segment an object in the full video with only a single click? This will enable video segmentation at scale with a very low budget opening the door to many applications. Towards this goal, in this paper we propose a bottom up approach where given a single click for each object in a video, we obtain the segmentation masks of these objects in the full video. In particular, we construct a correlation volume that assigns each pixel in a target frame to either one of the objects in the reference frame or the background. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Multimodal Machine Learning Applications
