Class-Agnostic Visio-Temporal Scene Sketch Semantic Segmentation
Aleyna K\"ut\"uk, Tevfik Metin Sezgin

TL;DR
This paper introduces a class-agnostic visio-temporal network for scene sketch segmentation that preserves stroke order and handles unseen categories, supported by a new large annotated dataset.
Contribution
It proposes the first instance and stroke-level segmentation method for scene sketches and introduces the FrISS dataset with dense annotations.
Findings
Outperforms existing scene sketch segmentation models
Effectively segments objects from unseen categories
Provides a large, richly annotated scene sketch dataset
Abstract
Scene sketch semantic segmentation is a crucial task for various applications including sketch-to-image retrieval and scene understanding. Existing sketch segmentation methods treat sketches as bitmap images, leading to the loss of temporal order among strokes due to the shift from vector to image format. Moreover, these methods struggle to segment objects from categories absent in the training data. In this paper, we propose a Class-Agnostic Visio-Temporal Network (CAVT) for scene sketch semantic segmentation. CAVT employs a class-agnostic object detector to detect individual objects in a scene and groups the strokes of instances through its post-processing module. This is the first approach that performs segmentation at both the instance and stroke levels within scene sketches. Furthermore, there is a lack of free-hand scene sketch datasets with both instance and stroke-level class…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition
