A Simple Video Segmenter by Tracking Objects Along Axial Trajectories

Ju He; Qihang Yu; Inkyu Shin; Xueqing Deng; Alan Yuille; Xiaohui Shen,; Liang-Chieh Chen

arXiv:2311.18537·cs.CV·June 13, 2024·2 cites

A Simple Video Segmenter by Tracking Objects Along Axial Trajectories

Ju He, Qihang Yu, Inkyu Shin, Xueqing Deng, Alan Yuille, Xiaohui Shen,, Liang-Chieh Chen

PDF

Open Access 2 Repos 1 Models

TL;DR

Axial-VS introduces a simple, efficient framework for video segmentation that tracks objects along axial trajectories, improving temporal consistency and outperforming existing methods on benchmarks.

Contribution

It proposes axial-trajectory attention to enhance clip-level video segmentation with better temporal consistency and computational efficiency.

Findings

01

Achieves state-of-the-art results on video segmentation benchmarks.

02

Reduces computational complexity compared to traditional attention methods.

03

Effectively maintains object tracking across video clips.

Abstract

Video segmentation requires consistently segmenting and tracking objects over time. Due to the quadratic dependency on input size, directly applying self-attention to video segmentation with high-resolution input features poses significant challenges, often leading to insufficient GPU memory capacity. Consequently, modern video segmenters either extend an image segmenter without incorporating any temporal attention or resort to window space-time attention in a naive manner. In this work, we present Axial-VS, a general and simple framework that enhances video segmenters by tracking objects along axial trajectories. The framework tackles video segmentation through two sub-tasks: short-term within-clip segmentation and long-term cross-clip tracking. In the first step, Axial-VS augments an off-the-shelf clip-level video segmenter with the proposed axial-trajectory attention, sequentially…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
turkeyju/Axial-VS
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques

MethodsHigh-resolution input · Contrastive Language-Image Pre-training