SyncVIS: Synchronized Video Instance Segmentation

Rongkun Zheng; Lu Qi; Xi Chen; Yi Wang; Kun Wang; Yu Qiao; Hengshuang; Zhao

arXiv:2412.00882·cs.CV·December 3, 2024

SyncVIS: Synchronized Video Instance Segmentation

Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang, Zhao

PDF

Open Access 1 Repo

TL;DR

SyncVIS introduces a synchronized video instance segmentation framework that explicitly models video-level and frame-level queries, improving performance on challenging benchmarks by promoting mutual learning and easier optimization.

Contribution

The paper proposes a novel synchronized modeling framework for VIS that explicitly incorporates video-level queries and synchronization modules, addressing limitations of asynchronous designs.

Findings

01

Achieves state-of-the-art results on YouTube-VIS and OVIS benchmarks.

02

Demonstrates the effectiveness of synchronized query modeling.

03

Validates generality across multiple challenging datasets.

Abstract

Recent DETR-based methods have advanced the development of Video Instance Segmentation (VIS) through transformers' efficiency and capability in modeling spatial and temporal information. Despite harvesting remarkable progress, existing works follow asynchronous designs, which model video sequences via either video-level queries only or adopting query-sensitive cascade structures, resulting in difficulties when handling complex and challenging video scenarios. In this work, we analyze the cause of this phenomenon and the limitations of the current solutions, and propose to conduct synchronized modeling via a new framework named SyncVIS. Specifically, SyncVIS explicitly introduces video-level query embeddings and designs two key modules to synchronize video-level query with frame-level query embeddings: a synchronized video-frame modeling paradigm and a synchronized embedding optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rkzheng99/syncvis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Generative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection