Time2General: Learning Spatiotemporal Invariant Representations for Domain-Generalization Video Semantic Segmentation
Siyu Chen, Ting Han, Haoling Huang, Chaolei Wang, Chengzheng Fu, Duxin Zhu, Guorong Cai, Jinhe Su

TL;DR
Time2General introduces a novel framework for domain-generalized video semantic segmentation that enhances temporal stability and cross-domain accuracy without requiring target labels or test-time adaptation.
Contribution
It proposes a Spatio-Temporal Memory Decoder and Masked Temporal Consistency Loss to improve temporal stability and robustness in domain-generalized video segmentation.
Findings
Significant improvement in cross-domain accuracy.
Enhanced temporal stability with reduced flicker.
Operates at up to 18 FPS on benchmarks.
Abstract
Domain Generalized Video Semantic Segmentation (DGVSS) is trained on a single labeled driving domain and is directly deployed on unseen domains without target labels and test-time adaptation while maintaining temporally consistent predictions over video streams. In practice, both domain shift and temporal-sampling shift break correspondence-based propagation and fixed-stride temporal aggregation, causing severe frame-to-frame flicker even in label-stable regions. We propose Time2General, a DGVSS framework built on Stability Queries. Time2General introduces a Spatio-Temporal Memory Decoder that aggregates multi-frame context into a clip-level spatio-temporal memory and decodes temporally consistent per-frame masks without explicit correspondence propagation. To further suppress flicker and improve robustness to varying sampling rates, the Masked Temporal Consistency Loss is proposed to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition
