OnlineRefer: A Simple Online Baseline for Referring Video Object   Segmentation

Dongming Wu; Tiancai Wang; Yuang Zhang; Xiangyu Zhang; Jianbing Shen

arXiv:2307.09356·cs.CV·July 19, 2023·2 cites

OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation

Dongming Wu, Tiancai Wang, Yuang Zhang, Xiangyu Zhang, Jianbing Shen

PDF

Open Access 1 Repo

TL;DR

OnlineRefer introduces a simple online approach for referring video object segmentation that improves temporal association and outperforms offline methods on multiple benchmarks.

Contribution

It proposes an online model with explicit query propagation for RVOS, challenging the offline paradigm and enhancing temporal association and accuracy.

Findings

01

Achieves 63.5 J&F on Refer-Youtube-VOS

02

Outperforms all offline methods on benchmarks

03

Effective with a Swin-L backbone

Abstract

Referring video object segmentation (RVOS) aims at segmenting an object in a video following human instruction. Current state-of-the-art methods fall into an offline pattern, in which each clip independently interacts with text embedding for cross-modal understanding. They usually present that the offline pattern is necessary for RVOS, yet model limited temporal association within each clip. In this work, we break up the previous offline belief and propose a simple yet effective online model using explicit query propagation, named OnlineRefer. Specifically, our approach leverages target cues that gather semantic information and position prior to improve the accuracy and ease of referring predictions for the current frame. Furthermore, we generalize our online model into a semi-online framework to be compatible with video-based backbones. To show the effectiveness of our method, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wudongming97/onlinerefer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization

MethodsContrastive Language-Image Pre-training