DVIS: Decoupled Video Instance Segmentation Framework
Tao Zhang, Xingye Tian, Yu Wu, Shunping Ji, Xuebo Wang, Yuan Zhang,, Pengfei Wan

TL;DR
DVIS introduces a decoupled framework for video instance segmentation that separates segmentation, tracking, and refinement, achieving state-of-the-art results with efficient computation on challenging benchmarks.
Contribution
The paper proposes a novel decoupling strategy for VIS, including a new referring tracker and temporal refiner, significantly improving performance and efficiency.
Findings
Surpasses SOTA by 7.3 AP and 9.6 VPQ on OVIS and VIPSeg datasets.
Lightweight tracker and refiner require only 1.69% of segmenter FLOPs.
Enables efficient training and inference on a single GPU.
Abstract
Video instance segmentation (VIS) is a critical task with diverse applications, including autonomous driving and video editing. Existing methods often underperform on complex and long videos in real world, primarily due to two factors. Firstly, offline methods are limited by the tightly-coupled modeling paradigm, which treats all frames equally and disregards the interdependencies between adjacent frames. Consequently, this leads to the introduction of excessive noise during long-term temporal alignment. Secondly, online methods suffer from inadequate utilization of temporal information. To tackle these challenges, we propose a decoupling strategy for VIS by dividing it into three independent sub-tasks: segmentation, tracking, and refinement. The efficacy of the decoupling strategy relies on two crucial elements: 1) attaining precise long-term alignment outcomes via frame-by-frame…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Advanced Vision and Imaging
