Spectrum-guided Multi-granularity Referring Video Object Segmentation

Bo Miao; Mohammed Bennamoun; Yongsheng Gao; Ajmal Mian

arXiv:2307.13537·cs.CV·July 26, 2023

Spectrum-guided Multi-granularity Referring Video Object Segmentation

Bo Miao, Mohammed Bennamoun, Yongsheng Gao, Ajmal Mian

PDF

Open Access 1 Repo

TL;DR

This paper introduces a spectrum-guided multi-granularity approach for referring video object segmentation that addresses feature drift issues, enabling more accurate and faster multi-object segmentation in videos.

Contribution

It proposes a novel spectrum-guided method for direct segmentation on encoded features and extends to multi-object R-VOS, improving speed and practicality.

Findings

01

Achieves state-of-the-art results on four benchmarks.

02

Outperforms competitors by 2.8% on Ref-YouTube-VOS.

03

Runs about 3 times faster in multi-object R-VOS mode.

Abstract

Current referring video object segmentation (R-VOS) techniques extract conditional kernels from encoded (low-resolution) vision-language features to segment the decoded high-resolution features. We discovered that this causes significant feature drift, which the segmentation kernels struggle to perceive during the forward computation. This negatively affects the ability of segmentation kernels. To address the drift problem, we propose a Spectrum-guided Multi-granularity (SgMg) approach, which performs direct segmentation on the encoded features and employs visual details to further optimize the masks. In addition, we propose Spectrum-guided Cross-modal Fusion (SCF) to perform intra-frame global interactions in the spectral domain for effective multimodal representation. Finally, we extend SgMg to perform multi-object R-VOS, a new paradigm that enables simultaneous segmentation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bo-miao/sgmg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Ear and Head Tumors · Speech and Audio Processing