Sketch-based Video Object Segmentation: Benchmark and Analysis

Ruolin Yang; Da Li; Conghui Hu; Timothy Hospedales; Honggang Zhang,; Yi-Zhe Song

arXiv:2311.07261·cs.CV·November 14, 2023·1 cites

Sketch-based Video Object Segmentation: Benchmark and Analysis

Ruolin Yang, Da Li, Conghui Hu, Timothy Hospedales, Honggang Zhang,, Yi-Zhe Song

PDF

Open Access

TL;DR

This paper introduces a new sketch-based video object segmentation task, providing a benchmark with three datasets, and demonstrates that sketches are an effective, low-cost reference for segmenting objects in videos.

Contribution

It proposes a novel sketch-based segmentation task, creates a benchmark with three datasets, and evaluates effective methods, showing sketches outperform other references in efficiency and effectiveness.

Findings

01

Sketches are more effective than other references for segmentation.

02

The benchmark includes three new datasets with human-drawn sketches.

03

Experimental results validate the efficiency of sketch-based references.

Abstract

Reference-based video object segmentation is an emerging topic which aims to segment the corresponding target object in each video frame referred by a given reference, such as a language expression or a photo mask. However, language expressions can sometimes be vague in conveying an intended concept and ambiguous when similar objects in one frame are hard to distinguish by language. Meanwhile, photo masks are costly to annotate and less practical to provide in a real application. This paper introduces a new task of sketch-based video object segmentation, an associated benchmark, and a strong baseline. Our benchmark includes three datasets, Sketch-DAVIS16, Sketch-DAVIS17 and Sketch-YouTube-VOS, which exploit human-drawn sketches as an informative yet low-cost reference for video object segmentation. We take advantage of STCN, a popular baseline of semi-supervised VOS task, and evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques

MethodsVOS