General and Task-Oriented Video Segmentation
Mu Chen, Liulei Li, Wenguan Wang, Ruijie Quan, Yi Yang

TL;DR
GvSeg is a versatile video segmentation framework that unifies multiple tasks through disentangled modeling and task-specific strategies, achieving superior performance across diverse benchmarks.
Contribution
It introduces a general, architecture-agnostic framework that models segmentation targets from appearance, position, and shape, and adapts query strategies for different tasks.
Findings
Outperforms existing solutions on four video segmentation tasks
Effective across seven benchmark datasets
Successfully unifies multiple segmentation tasks in a single framework
Abstract
We present GvSeg, a general video segmentation framework for addressing four different video segmentation tasks (i.e., instance, semantic, panoptic, and exemplar-guided) while maintaining an identical architectural design. Currently, there is a trend towards developing general video segmentation solutions that can be applied across multiple tasks. This streamlines research endeavors and simplifies deployment. However, such a highly homogenized framework in current design, where each element maintains uniformity, could overlook the inherent diversity among different tasks and lead to suboptimal performance. To tackle this, GvSeg: i) provides a holistic disentanglement and modeling for segment targets, thoroughly examining them from the perspective of appearance, position, and shape, and on this basis, ii) reformulates the query initialization, matching and sampling strategies in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
