TarViS: A Unified Approach for Target-based Video Segmentation
Ali Athar, Alexander Hermans, Jonathon Luiten, Deva Ramanan, Bastian, Leibe

TL;DR
TarViS introduces a versatile, unified neural network architecture capable of handling multiple video segmentation tasks simultaneously, achieving state-of-the-art results across diverse benchmarks without task-specific retraining.
Contribution
The paper presents TarViS, a novel multi-task video segmentation model that generalizes across various tasks by modeling targets as abstract queries, enabling joint training and task switching during inference.
Findings
Achieves state-of-the-art on 5 out of 7 benchmarks
Successfully applies to four different segmentation tasks
Demonstrates flexible, task-agnostic target modeling
Abstract
The general domain of video segmentation is currently fragmented into different tasks spanning multiple benchmarks. Despite rapid progress in the state-of-the-art, current methods are overwhelmingly task-specific and cannot conceptually generalize to other tasks. Inspired by recent approaches with multi-task capability, we propose TarViS: a novel, unified network architecture that can be applied to any task that requires segmenting a set of arbitrarily defined 'targets' in video. Our approach is flexible with respect to how tasks define these targets, since it models the latter as abstract 'queries' which are then used to predict pixel-precise target masks. A single TarViS model can be trained jointly on a collection of datasets spanning different tasks, and can hot-swap between tasks during inference without any task-specific retraining. To demonstrate its effectiveness, we apply…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
