1st Place Solution for YouTubeVOS Challenge 2022: Referring Video Object   Segmentation

Zhiwei Hu; Bo Chen; Yuan Gao; Zhilong Ji; Jinfeng Bai

arXiv:2212.14679·cs.CV·January 2, 2023·1 cites

1st Place Solution for YouTubeVOS Challenge 2022: Referring Video Object Segmentation

Zhiwei Hu, Bo Chen, Yuan Gao, Zhilong Ji, Jinfeng Bai

PDF

Open Access 1 Repo

TL;DR

This paper presents a simple, effective end-to-end Transformer-based pipeline for referring video object segmentation, achieving top results in the CVPR2022 challenge by improving state-of-the-art methods and leveraging high-quality keyframes.

Contribution

It introduces an improved one-stage method based on ReferFormer and utilizes high-quality keyframes with a video segmentation model to enhance mask quality and temporal consistency.

Findings

01

Achieved 70.3 J&F on validation set

02

Reached 64.1 final leaderboard score after ensemble

03

Ranked 1st in CVPR2022 Referring Youtube-VOS challenge

Abstract

The task of referring video object segmentation aims to segment the object in the frames of a given video to which the referring expressions refer. Previous methods adopt multi-stage approach and design complex pipelines to obtain promising results. Recently, the end-to-end method based on Transformer has proved its superiority. In this work, we draw on the advantages of the above methods to provide a simple and effective pipeline for RVOS. Firstly, We improve the state-of-the-art one-stage method ReferFormer to obtain mask sequences that are strongly correlated with language descriptions. Secondly, based on a reliable and high-quality keyframe, we leverage the superior performance of video object segmentation model to further enhance the quality and temporal consistency of the mask results. Our single model reaches 70.3 J &F on the Referring Youtube-VOS validation set and 63.0 on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhiweihhh/cvpr2022-rvos-challenge
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Speech and Audio Processing

MethodsMulti-Head Attention · Attention Is All You Need · Test · Absolute Position Encodings · Linear Layer · Adam · Layer Normalization · Softmax · Byte Pair Encoding · Residual Connection