Temporal Collection and Distribution for Referring Video Object   Segmentation

Jiajin Tang; Ge Zheng; Sibei Yang

arXiv:2309.03473·cs.CV·September 8, 2023

Temporal Collection and Distribution for Referring Video Object Segmentation

Jiajin Tang, Ge Zheng, Sibei Yang

PDF

Open Access

TL;DR

This paper introduces a novel temporal collection-distribution mechanism for referring video object segmentation, improving the alignment of language, motion, and object segmentation across frames.

Contribution

It proposes a new temporal collection-distribution approach that enhances cross-modal reasoning and object motion modeling in referring video object segmentation.

Findings

01

Outperforms state-of-the-art methods on all benchmarks

02

Effectively captures object motions and spatial-temporal relationships

03

Improves global referent understanding and frame-level segmentation

Abstract

Referring video object segmentation aims to segment a referent throughout a video sequence according to a natural language expression. It requires aligning the natural language expression with the objects' motions and their dynamic associations at the global video level but segmenting objects at the frame level. To achieve this goal, we propose to simultaneously maintain a global referent token and a sequence of object queries, where the former is responsible for capturing video-level referent according to the language expression, while the latter serves to better locate and segment objects with each frame. Furthermore, to explicitly capture object motions and spatial-temporal cross-modal reasoning over objects, we propose a novel temporal collection-distribution mechanism for interacting between the global referent token and object queries. Specifically, the temporal collection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization