SaSaSaSa2VA: 2nd Place of the 5th PVUW MeViS-Text Track

Dengxian Gong; Quanzhu Niu; Shihao Chen; Yuanzheng Wu; Yikang Zhou; Tao Zhang; Haobo Yuan; Lu Qi; Shunping Ji

arXiv:2603.27241·cs.CV·March 31, 2026

SaSaSaSa2VA: 2nd Place of the 5th PVUW MeViS-Text Track

Dengxian Gong, Quanzhu Niu, Shihao Chen, Yuanzheng Wu, Yikang Zhou, Tao Zhang, Haobo Yuan, Lu Qi, Shunping Ji

PDF

TL;DR

This paper introduces SaSaSaSa2VA, an enhanced RVOS method with an existence-aware verification mechanism, achieving second place in the PVUW MeViS-Text Track with strong performance on motion-centric referring tasks.

Contribution

It extends SaSaSa2VA by incorporating an existence-aware verification, improving performance on motion-centric referring video object segmentation tasks.

Findings

01

Achieved 89.19 score in PVUW Challenge, 2nd place.

02

Existence-aware verification significantly improves motion-centric RVOS performance.

03

Quantitative results and ablations confirm the effectiveness of the proposed strategy.

Abstract

Referring video object segmentation (RVOS) commonly grounds targets in videos based on static textual cues. MeViS benchmark extends this by incorporating motion-centric expressions (referring & reasoning motion expressions) and introducing no-target queries. Extending SaSaSa2VA, where increased input frames and [SEG] tokens already strengthen the Sa2VA backbone, we adopt a simple yet effective target existence-aware verification mechanism, leading to Still Awesome SaSaSa2VA (SaSaSaSa2VA). Despite its simplicity, the method achieves a final score of 89.19 in the 5th PVUW Challenge (MeViS-Text Track), securing 2nd place. Both quantitative results and ablations suggest that this existence-aware verification strategy is sufficient to unlock strong performance on motion-centric referring tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.