GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance   Segmentation

Tanveer Hannan; Rajat Koner; Maximilian Bernhard; Suprosanna Shit,; Bjoern Menze; Volker Tresp; Matthias Schubert; Thomas Seidl

arXiv:2305.17096·cs.CV·May 29, 2023·2 cites

GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation

Tanveer Hannan, Rajat Koner, Maximilian Bernhard, Suprosanna Shit,, Bjoern Menze, Volker Tresp, Matthias Schubert, Thomas Seidl

PDF

Open Access 1 Repo

TL;DR

GRAtt-VIS introduces a gated residual attention mechanism that detects and rectifies errors in online video instance segmentation, improving long-term tracking and reducing memory overhead.

Contribution

The paper proposes GRAtt-VIS, a novel framework combining gated residual connections and masked self-attention for enhanced video instance segmentation.

Findings

01

Achieves state-of-the-art results on YouTube-VIS and OVIS datasets.

02

Effectively detects and rectifies degraded features during occlusion and abrupt changes.

03

Reduces attention complexity while maintaining high performance.

Abstract

Recent trends in Video Instance Segmentation (VIS) have seen a growing reliance on online methods to model complex and lengthy video sequences. However, the degradation of representation and noise accumulation of the online methods, especially during occlusion and abrupt changes, pose substantial challenges. Transformer-based query propagation provides promising directions at the cost of quadratic memory attention. However, they are susceptible to the degradation of instance features due to the above-mentioned challenges and suffer from cascading effects. The detection and rectification of such errors remain largely underexplored. To this end, we introduce \textbf{GRAtt-VIS}, \textbf{G}ated \textbf{R}esidual \textbf{Att}ention for \textbf{V}ideo \textbf{I}nstance \textbf{S}egmentation. Firstly, we leverage a Gumbel-Softmax-based gate to detect possible errors in the current frame. Next,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tanveer81/grattvis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Multimodal Machine Learning Applications

MethodsResidual Connection