Temporally Consistent Referring Video Object Segmentation with Hybrid   Memory

Bo Miao; Mohammed Bennamoun; Yongsheng Gao; Mubarak Shah; Ajmal Mian

arXiv:2403.19407·cs.CV·October 14, 2024·1 cites

Temporally Consistent Referring Video Object Segmentation with Hybrid Memory

Bo Miao, Mohammed Bennamoun, Yongsheng Gao, Mubarak Shah, Ajmal Mian

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel hybrid memory approach for referring video object segmentation that improves temporal consistency and achieves top performance on major benchmarks.

Contribution

It proposes a hybrid memory mechanism and a new Mask Consistency Score metric to enhance temporal consistency in R-VOS tasks.

Findings

01

Significant improvement in temporal consistency metrics.

02

Top-ranked performance on Ref-YouTube-VOS and Ref-DAVIS17.

03

Effective inter-frame collaboration for robust segmentation.

Abstract

Referring Video Object Segmentation (R-VOS) methods face challenges in maintaining consistent object segmentation due to temporal context variability and the presence of other visually similar objects. We propose an end-to-end R-VOS paradigm that explicitly models temporal instance consistency alongside the referring segmentation. Specifically, we introduce a novel hybrid memory that facilitates inter-frame collaboration for robust spatio-temporal matching and propagation. Features of frames with automatically generated high-quality reference masks are propagated to segment the remaining frames based on multi-granularity association to achieve temporally consistent R-VOS. Furthermore, we propose a new Mask Consistency Score (MCS) metric to evaluate the temporal consistency of video segmentation. Extensive experiments demonstrate that our approach enhances temporal consistency by a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bo-miao/HTR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection · Video Surveillance and Tracking Methods