Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation
Anirudh S Chakravarthy, Won-Dong Jang, Zudi Lin, Donglai Wei, Song, Bai, Hanspeter Pfister

TL;DR
This paper introduces a novel video instance segmentation method that uses inter-frame attention to improve temporal stability and detection consistency, outperforming previous methods on the YouTube-VIS benchmark.
Contribution
The proposed approach leverages inter-frame attentions to recover missing object detections, enhancing temporal consistency in video segmentation.
Findings
Achieves 36.0% mAP on YouTube-VIS benchmark
Outperforms previous state-of-the-art methods
Operates fully online without future frame access
Abstract
Video instance segmentation aims to detect, segment, and track objects in a video. Current approaches extend image-level segmentation algorithms to the temporal domain. However, this results in temporally inconsistent masks. In this work, we identify the mask quality due to temporal stability as a performance bottleneck. Motivated by this, we propose a video instance segmentation method that alleviates the problem due to missing detections. Since this cannot be solved simply using spatial information, we leverage temporal context using inter-frame attentions. This allows our network to refocus on missing objects using box predictions from the neighbouring frame, thereby overcoming missing detections. Our method significantly outperforms previous state-of-the-art algorithms using the Mask R-CNN backbone, by achieving 36.0% mAP on the YouTube-VIS benchmark. Additionally, our method is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Video Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Region Proposal Network · 1x1 Convolution · Residual Connection · Batch Normalization · Average Pooling · Max Pooling · Global Average Pooling · Bottleneck Residual Block · Residual Block
