DVIS: Decoupled Video Instance Segmentation Framework

Tao Zhang; Xingye Tian; Yu Wu; Shunping Ji; Xuebo Wang; Yuan Zhang,; Pengfei Wan

arXiv:2306.03413·cs.CV·July 17, 2023·1 cites

DVIS: Decoupled Video Instance Segmentation Framework

Tao Zhang, Xingye Tian, Yu Wu, Shunping Ji, Xuebo Wang, Yuan Zhang,, Pengfei Wan

PDF

Open Access 1 Repo

TL;DR

DVIS introduces a decoupled framework for video instance segmentation that separates segmentation, tracking, and refinement, achieving state-of-the-art results with efficient computation on challenging benchmarks.

Contribution

The paper proposes a novel decoupling strategy for VIS, including a new referring tracker and temporal refiner, significantly improving performance and efficiency.

Findings

01

Surpasses SOTA by 7.3 AP and 9.6 VPQ on OVIS and VIPSeg datasets.

02

Lightweight tracker and refiner require only 1.69% of segmenter FLOPs.

03

Enables efficient training and inference on a single GPU.

Abstract

Video instance segmentation (VIS) is a critical task with diverse applications, including autonomous driving and video editing. Existing methods often underperform on complex and long videos in real world, primarily due to two factors. Firstly, offline methods are limited by the tightly-coupled modeling paradigm, which treats all frames equally and disregards the interdependencies between adjacent frames. Consequently, this leads to the introduction of excessive noise during long-term temporal alignment. Secondly, online methods suffer from inadequate utilization of temporal information. To tackle these challenges, we propose a decoupling strategy for VIS by dividing it into three independent sub-tasks: segmentation, tracking, and refinement. The efficacy of the decoupling strategy relies on two crucial elements: 1) attaining precise long-term alignment outcomes via frame-by-frame…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhang-tao-whu/DVIS
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Advanced Vision and Imaging