Static-Dynamic Class-level Perception Consistency in Video Semantic   Segmentation

Zhigang Cen; Ningyan Guo; Wenjing Xu; Zhiyong Feng; Danlan Huang

arXiv:2412.08034·cs.CV·December 12, 2024

Static-Dynamic Class-level Perception Consistency in Video Semantic Segmentation

Zhigang Cen, Ningyan Guo, Wenjing Xu, Zhiyong Feng, Danlan Huang

PDF

Open Access

TL;DR

This paper introduces a novel class-level perceptual consistency framework for video semantic segmentation, leveraging class prototypes and multi-scale alignment to improve temporal consistency and segmentation accuracy.

Contribution

It proposes a static-dynamic class-level perceptual consistency framework with contrastive learning and semantic alignment modules, advancing beyond pixel-level methods.

Findings

01

Outperforms state-of-the-art on VSPW and Cityscapes datasets

02

Reduces computational cost with a window-based attention method

03

Enhances temporal semantic consistency in video segmentation

Abstract

Video semantic segmentation(VSS) has been widely employed in lots of fields, such as simultaneous localization and mapping, autonomous driving and surveillance. Its core challenge is how to leverage temporal information to achieve better segmentation. Previous efforts have primarily focused on pixel-level static-dynamic contexts matching, utilizing techniques such as optical flow and attention mechanisms. Instead, this paper rethinks static-dynamic contexts at the class level and proposes a novel static-dynamic class-level perceptual consistency (SD-CPC) framework. In this framework, we propose multivariate class prototype with contrastive learning and a static-dynamic semantic alignment module. The former provides class-level constraints for the model, obtaining personalized inter-class features and diversified intra-class features. The latter first establishes intra-frame spatial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques

MethodsSoftmax · Attention Is All You Need · Contrastive Learning