RefineVIS: Video Instance Segmentation with Temporal Attention   Refinement

Andre Abrantes; Jiang Wang; Peng Chu; Quanzeng You; Zicheng Liu

arXiv:2306.04774·cs.CV·June 9, 2023·1 cites

RefineVIS: Video Instance Segmentation with Temporal Attention Refinement

Andre Abrantes, Jiang Wang, Peng Chu, Quanzeng You, Zicheng Liu

PDF

Open Access

TL;DR

RefineVIS is a new video instance segmentation framework that iteratively refines object association and segmentation masks using temporal attention and contrastive learning, achieving state-of-the-art results.

Contribution

It introduces a dual-representation approach with temporal attention refinement and contrastive learning for improved accuracy in VIS.

Findings

01

Achieves 64.4 AP on YouTube-VIS 2019

02

Achieves 61.4 AP on YouTube-VIS 2021

03

Achieves 46.1 AP on OVIS dataset

Abstract

We introduce a novel framework called RefineVIS for Video Instance Segmentation (VIS) that achieves good object association between frames and accurate segmentation masks by iteratively refining the representations using sequence context. RefineVIS learns two separate representations on top of an off-the-shelf frame-level image instance segmentation model: an association representation responsible for associating objects across frames and a segmentation representation that produces accurate segmentation masks. Contrastive learning is utilized to learn temporally stable association representations. A Temporal Attention Refinement (TAR) module learns discriminative segmentation representations by exploiting temporal relationships and a novel temporal contrastive denoising technique. Our method supports both online and offline inference. It achieves state-of-the-art video instance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Visual Attention and Saliency Detection · Image Enhancement Techniques

MethodsContrastive Learning