Class-Agnostic Visio-Temporal Scene Sketch Semantic Segmentation

Aleyna K\"ut\"uk; Tevfik Metin Sezgin

arXiv:2410.00266·cs.CV·October 2, 2024

Class-Agnostic Visio-Temporal Scene Sketch Semantic Segmentation

Aleyna K\"ut\"uk, Tevfik Metin Sezgin

PDF

Open Access

TL;DR

This paper introduces a class-agnostic visio-temporal network for scene sketch segmentation that preserves stroke order and handles unseen categories, supported by a new large annotated dataset.

Contribution

It proposes the first instance and stroke-level segmentation method for scene sketches and introduces the FrISS dataset with dense annotations.

Findings

01

Outperforms existing scene sketch segmentation models

02

Effectively segments objects from unseen categories

03

Provides a large, richly annotated scene sketch dataset

Abstract

Scene sketch semantic segmentation is a crucial task for various applications including sketch-to-image retrieval and scene understanding. Existing sketch segmentation methods treat sketches as bitmap images, leading to the loss of temporal order among strokes due to the shift from vector to image format. Moreover, these methods struggle to segment objects from categories absent in the training data. In this paper, we propose a Class-Agnostic Visio-Temporal Network (CAVT) for scene sketch semantic segmentation. CAVT employs a class-agnostic object detector to detect individual objects in a scene and groups the strokes of instances through its post-processing module. This is the first approach that performs segmentation at both the instance and stroke levels within scene sketches. Furthermore, there is a lack of free-hand scene sketch datasets with both instance and stroke-level class…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition