Tag-Based Attention Guided Bottom-Up Approach for Video Instance   Segmentation

Jyoti Kini; Mubarak Shah

arXiv:2204.10765·cs.CV·April 25, 2022

Tag-Based Attention Guided Bottom-Up Approach for Video Instance Segmentation

Jyoti Kini, Mubarak Shah

PDF

Open Access

TL;DR

This paper introduces a novel end-to-end bottom-up video instance segmentation method that uses tag assignment and a spatio-temporal tagging loss, processing entire video clips as 3D volumes for improved temporal consistency and efficiency.

Contribution

It proposes a new spatio-temporal tagging loss and a tag-based attention module for video instance segmentation, enabling end-to-end training and better temporal propagation.

Findings

01

Achieves competitive results on YouTube-VIS and DAVIS-19 datasets.

02

Offers a more efficient, end-to-end approach compared to multi-stage methods.

03

Demonstrates effective separation and tracking of object instances across videos.

Abstract

Video Instance Segmentation is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence. Most existing methods typically accomplish this task by employing a multi-stage top-down approach that usually involves separate networks to detect and segment objects in each frame, followed by associating these detections in consecutive frames using a learned tracking head. In this work, however, we introduce a simple end-to-end trainable bottom-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach. Unlike contemporary frame-based models, our network pipeline processes an input video clip as a single 3D volume to incorporate temporal information. The central idea of our formulation is to solve the video instance segmentation task as a tag assignment problem,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Video Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques

MethodsContrastive Language-Image Pre-training