VideoClick: Video Object Segmentation with a Single Click

Namdar Homayounfar; Justin Liang; Wei-Chiu Ma; Raquel Urtasun

arXiv:2101.06545·cs.CV·January 19, 2021

VideoClick: Video Object Segmentation with a Single Click

Namdar Homayounfar, Justin Liang, Wei-Chiu Ma, Raquel Urtasun

PDF

Open Access

TL;DR

VideoClick introduces a novel single-click method for efficient video object segmentation, significantly reducing annotation effort and enabling scalable video analysis with competitive accuracy.

Contribution

The paper presents a bottom-up approach using a correlation volume and recurrent attention to achieve accurate video object segmentation from just one click per object.

Findings

01

Outperforms baseline methods on CityscapesVideo dataset

02

Reduces annotation time significantly

03

Effective in segmenting multiple objects in videos

Abstract

Annotating videos with object segmentation masks typically involves a two stage procedure of drawing polygons per object instance for all the frames and then linking them through time. While simple, this is a very tedious, time consuming and expensive process, making the creation of accurate annotations at scale only possible for well-funded labs. What if we were able to segment an object in the full video with only a single click? This will enable video segmentation at scale with a very low budget opening the door to many applications. Towards this goal, in this paper we propose a bottom up approach where given a single click for each object in a video, we obtain the segmentation masks of these objects in the full video. In particular, we construct a correlation volume that assigns each pixel in a target frame to either one of the objects in the reference frame or the background. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Multimodal Machine Learning Applications