Learning Motion and Temporal Cues for Unsupervised Video Object   Segmentation

Yunzhi Zhuge; Hongyu Gu; Lu Zhang; Jinqing Qi; Huchuan Lu

arXiv:2501.07806·cs.CV·January 15, 2025

Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation

Yunzhi Zhuge, Hongyu Gu, Lu Zhang, Jinqing Qi, Huchuan Lu

PDF

1 Repo

TL;DR

This paper introduces MTNet, an efficient unsupervised video object segmentation method that combines motion and appearance cues with a temporal transformer to improve accuracy and robustness across challenging scenarios.

Contribution

The paper proposes a novel framework, MTNet, that integrates motion, appearance, and long-range temporal modeling within a unified architecture for improved UVOS performance.

Findings

01

Achieves state-of-the-art results on multiple benchmarks.

02

Effectively combines motion and appearance features.

03

Demonstrates robustness in diverse challenging scenarios.

Abstract

In this paper, we address the challenges in unsupervised video object segmentation (UVOS) by proposing an efficient algorithm, termed MTNet, which concurrently exploits motion and temporal cues. Unlike previous methods that focus solely on integrating appearance with motion or on modeling temporal relations, our method combines both aspects by integrating them within a unified framework. MTNet is devised by effectively merging appearance and motion features during the feature extraction process within encoders, promoting a more complementary representation. To capture the intricate long-range contextual dynamics and information embedded within videos, a temporal transformer module is introduced, facilitating efficacious inter-frame interactions throughout a video clip. Furthermore, we employ a cascade of decoders all feature levels across all feature levels to optimally exploit the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hy0523/mtnet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus