GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation
Tanveer Hannan, Rajat Koner, Maximilian Bernhard, Suprosanna Shit,, Bjoern Menze, Volker Tresp, Matthias Schubert, Thomas Seidl

TL;DR
GRAtt-VIS introduces a gated residual attention mechanism that detects and rectifies errors in online video instance segmentation, improving long-term tracking and reducing memory overhead.
Contribution
The paper proposes GRAtt-VIS, a novel framework combining gated residual connections and masked self-attention for enhanced video instance segmentation.
Findings
Achieves state-of-the-art results on YouTube-VIS and OVIS datasets.
Effectively detects and rectifies degraded features during occlusion and abrupt changes.
Reduces attention complexity while maintaining high performance.
Abstract
Recent trends in Video Instance Segmentation (VIS) have seen a growing reliance on online methods to model complex and lengthy video sequences. However, the degradation of representation and noise accumulation of the online methods, especially during occlusion and abrupt changes, pose substantial challenges. Transformer-based query propagation provides promising directions at the cost of quadratic memory attention. However, they are susceptible to the degradation of instance features due to the above-mentioned challenges and suffer from cascading effects. The detection and rectification of such errors remain largely underexplored. To this end, we introduce \textbf{GRAtt-VIS}, \textbf{G}ated \textbf{R}esidual \textbf{Att}ention for \textbf{V}ideo \textbf{I}nstance \textbf{S}egmentation. Firstly, we leverage a Gumbel-Softmax-based gate to detect possible errors in the current frame. Next,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Multimodal Machine Learning Applications
MethodsResidual Connection
