Semantic Instance Meets Salient Object: Study on Video Semantic Salient Instance Segmentation
Trung-Nghia Le, Akihiro Sugimoto

TL;DR
This paper introduces a new task called video semantic salient instance segmentation (VSSIS), proposing a baseline framework that combines semantic and salient object segmentation to identify and track meaningful salient instances in videos, with a new dataset for evaluation.
Contribution
The paper presents the SISO framework for VSSIS, integrating semantic and salient segmentation with a novel fusion, propagation, and tracking method, and provides a new annotated dataset SESIV for this task.
Findings
SISO effectively handles occlusions in videos.
The SESIV dataset provides high-quality annotations for VSSIS.
Experimental results demonstrate the baseline's robustness and accuracy.
Abstract
Focusing on only semantic instances that only salient in a scene gains more benefits for robot navigation and self-driving cars than looking at all objects in the whole scene. This paper pushes the envelope on salient regions in a video to decompose them into semantically meaningful components, namely, semantic salient instances. We provide the baseline for the new task of video semantic salient instance segmentation (VSSIS), that is, Semantic Instance - Salient Object (SISO) framework. The SISO framework is simple yet efficient, leveraging advantages of two different segmentation tasks, i.e. semantic instance segmentation and salient object segmentation to eventually fuse them for the final result. In SISO, we introduce a sequential fusion by looking at overlapping pixels between semantic instances and salient regions to have non-overlapping instances one by one. We also introduce a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
