Tracking Anything with Decoupled Video Segmentation

Ho Kei Cheng; Seoung Wug Oh; Brian Price; Alexander Schwing,; Joon-Young Lee

arXiv:2309.03903·cs.CV·September 8, 2023·1 cites

Tracking Anything with Decoupled Video Segmentation

Ho Kei Cheng, Seoung Wug Oh, Brian Price, Alexander Schwing,, Joon-Young Lee

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces DEVA, a decoupled video segmentation method that combines task-specific image segmentation with universal temporal propagation, enabling flexible and cost-effective tracking of various objects without extensive task-specific training.

Contribution

The paper proposes a novel decoupled approach for video segmentation that separates image-level segmentation from temporal propagation, reducing training costs and improving generalization across tasks.

Findings

01

Outperforms end-to-end methods in data-scarce scenarios

02

Effective in large-vocabulary and open-world segmentation tasks

03

Enables task-specific segmentation with a universal temporal model

Abstract

Training data for video segmentation are expensive to annotate. This impedes extensions of end-to-end algorithms to new video segmentation tasks, especially in large-vocabulary settings. To 'track anything' without training on video data for every individual task, we develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation. Due to this design, we only need an image-level model for the target task (which is cheaper to train) and a universal temporal propagation model which is trained once and generalizes across tasks. To effectively combine these two modules, we use bi-directional propagation for (semi-)online fusion of segmentation hypotheses from different frames to generate a coherent segmentation. We show that this decoupled formulation compares favorably to end-to-end…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hkchengrex/Tracking-Anything-with-DEVA
pytorchOfficial

Videos

Tracking Anything with Decoupled Video Segmentation· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Visual Attention and Saliency Detection · Domain Adaptation and Few-Shot Learning