2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion   Expression guided Video Segmentation

Bin Cao; Yisi Zhang; Xuanxu Lin; Xingjian He; Bo Zhao; Jing Liu

arXiv:2406.13939·cs.CV·June 21, 2024

2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation

Bin Cao, Yisi Zhang, Xuanxu Lin, Xingjian He, Bo Zhao, Jing Liu

PDF

Open Access

TL;DR

This paper presents a novel approach for motion expression guided video segmentation that leverages video instance segmentation and SAM for improved temporal and spatial accuracy, achieving second place in CVPR 2024.

Contribution

It introduces a method combining mask information from instance segmentation and SAM for enhanced motion-oriented video segmentation based on natural language expressions.

Findings

01

Achieved 49.92 J&F score in validation

02

Secured 54.20 J&F score in test phase

03

Ranked 2nd in MeViS Track at CVPR 2024

Abstract

Motion Expression guided Video Segmentation is a challenging task that aims at segmenting objects in the video based on natural language expressions with motion descriptions. Unlike the previous referring video object segmentation (RVOS), this task focuses more on the motion in video content for language-guided video object segmentation, requiring an enhanced ability to model longer temporal, motion-oriented vision-language data. In this report, based on the RVOS methods, we successfully introduce mask information obtained from the video instance segmentation model as preliminary information for temporal enhancement and employ SAM for spatial refinement. Finally, our method achieved a score of 49.92 J &F in the validation phase and 54.20 J &F in the test phase, securing the final ranking of 2nd in the MeViS Track at the CVPR 2024 PVUW Challenge.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing Techniques and Applications

MethodsSegment Anything Model