A2VIS: Amodal-Aware Approach to Video Instance Segmentation

Minh Tran; Thang Pham; Winston Bounsavy; Tri Nguyen; Ngan Le

arXiv:2412.01147·cs.CV·April 11, 2025

A2VIS: Amodal-Aware Approach to Video Instance Segmentation

Minh Tran, Thang Pham, Winston Bounsavy, Tri Nguyen, Ngan Le

PDF

Open Access

TL;DR

A2VIS introduces an amodal-aware framework for video instance segmentation that improves occlusion handling by integrating visible and occluded object parts across spatiotemporal dimensions, enhancing tracking and segmentation accuracy.

Contribution

The paper presents a novel amodal-aware approach incorporating a spatiotemporal-prior mask head for better occlusion handling in video instance segmentation.

Findings

01

Outperforms existing methods in MOT and VIS tasks

02

Achieves more consistent object tracking during occlusion

03

Demonstrates effectiveness of amodal representations in videos

Abstract

Handling occlusion remains a significant challenge for video instance-level tasks like Multiple Object Tracking (MOT) and Video Instance Segmentation (VIS). In this paper, we propose a novel framework, Amodal-Aware Video Instance Segmentation (A2VIS), which incorporates amodal representations to achieve a reliable and comprehensive understanding of both visible and occluded parts of objects in a video. The key intuition is that awareness of amodal segmentation through spatiotemporal dimension enables a stable stream of object information. In scenarios where objects are partially or completely hidden from view, amodal segmentation offers more consistency and less dramatic changes along the temporal axis compared to visible segmentation. Hence, both amodal and visible information from all clips can be integrated into one global instance prototype. To effectively address the challenge of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition