Temporal-aware Hierarchical Mask Classification for Video Semantic Segmentation
Zhaochong An, Guolei Sun, Zongwei Wu, Hao Tang, Luc Van Gool

TL;DR
This paper introduces THE-Mask, a novel temporal-aware hierarchical mask classification method for video semantic segmentation, improving training efficiency and temporal information capture, leading to state-of-the-art results on VSPW.
Contribution
The paper proposes a new hierarchical query and matching mechanism, along with a temporal aggregation decoder, to enhance video semantic segmentation performance.
Findings
Achieves state-of-the-art results on VSPW benchmark.
Uses a two-round matching mechanism for more efficient training.
Effectively captures temporal information across video frames.
Abstract
Modern approaches have proved the huge potential of addressing semantic segmentation as a mask classification task which is widely used in instance-level segmentation. This paradigm trains models by assigning part of object queries to ground truths via conventional one-to-one matching. However, we observe that the popular video semantic segmentation (VSS) dataset has limited categories per video, meaning less than 10% of queries could be matched to receive meaningful gradient updates during VSS training. This inefficiency limits the full expressive potential of all queries.Thus, we present a novel solution THE-Mask for VSS, which introduces temporal-aware hierarchical object queries for the first time. Specifically, we propose to use a simple two-round matching mechanism to involve more queries matched with minimal cost during training while without any extra cost during inference. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
