Sketch-based Video Object Segmentation: Benchmark and Analysis
Ruolin Yang, Da Li, Conghui Hu, Timothy Hospedales, Honggang Zhang,, Yi-Zhe Song

TL;DR
This paper introduces a new sketch-based video object segmentation task, providing a benchmark with three datasets, and demonstrates that sketches are an effective, low-cost reference for segmenting objects in videos.
Contribution
It proposes a novel sketch-based segmentation task, creates a benchmark with three datasets, and evaluates effective methods, showing sketches outperform other references in efficiency and effectiveness.
Findings
Sketches are more effective than other references for segmentation.
The benchmark includes three new datasets with human-drawn sketches.
Experimental results validate the efficiency of sketch-based references.
Abstract
Reference-based video object segmentation is an emerging topic which aims to segment the corresponding target object in each video frame referred by a given reference, such as a language expression or a photo mask. However, language expressions can sometimes be vague in conveying an intended concept and ambiguous when similar objects in one frame are hard to distinguish by language. Meanwhile, photo masks are costly to annotate and less practical to provide in a real application. This paper introduces a new task of sketch-based video object segmentation, an associated benchmark, and a strong baseline. Our benchmark includes three datasets, Sketch-DAVIS16, Sketch-DAVIS17 and Sketch-YouTube-VOS, which exploit human-drawn sketches as an informative yet low-cost reference for video object segmentation. We take advantage of STCN, a popular baseline of semi-supervised VOS task, and evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
MethodsVOS
