CAVIS: Context-Aware Video Instance Segmentation

Seunghun Lee; Jiwan Seo; Kiljoon Han; Minwoo Choi; Sunghoon Im

arXiv:2407.03010·cs.CV·July 10, 2025

CAVIS: Context-Aware Video Instance Segmentation

Seunghun Lee, Jiwan Seo, Kiljoon Han, Minwoo Choi, Sunghoon Im

PDF

Open Access 1 Repo 1 Models

TL;DR

CAVIS introduces a novel context-aware framework for video instance segmentation that leverages contextual information and a new contrastive loss to improve tracking accuracy across challenging datasets.

Contribution

The paper presents the CAIT tracker and PCC loss, which together enhance instance association by integrating context and ensuring feature consistency across frames.

Findings

01

Outperforms state-of-the-art methods on multiple benchmarks.

02

Achieves significant improvements on the challenging OVIS dataset.

03

Demonstrates robustness in complex video scenarios.

Abstract

In this paper, we introduce the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. To efficiently extract and leverage this information, we propose the Context-Aware Instance Tracker (CAIT), which merges contextual data surrounding the instances with the core instance features to improve tracking accuracy. Additionally, we design the Prototypical Cross-frame Contrastive (PCC) loss, which ensures consistency in object-level features across frames, thereby significantly enhancing matching accuracy. CAVIS demonstrates superior performance over state-of-the-art methods on all benchmark datasets in video instance segmentation (VIS) and video panoptic segmentation (VPS). Notably, our method excels on the OVIS dataset, known for its particularly challenging videos. Project…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Seung-Hun-Lee/CAVIS
pytorchOfficial

Models

🤗
DGIST-CVLAB-Video/CAVIS
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition