SaSaSaSa2VA: 2nd Place of the 5th PVUW MeViS-Text Track
Dengxian Gong, Quanzhu Niu, Shihao Chen, Yuanzheng Wu, Yikang Zhou, Tao Zhang, Haobo Yuan, Lu Qi, Shunping Ji

TL;DR
This paper introduces SaSaSaSa2VA, an enhanced RVOS method with an existence-aware verification mechanism, achieving second place in the PVUW MeViS-Text Track with strong performance on motion-centric referring tasks.
Contribution
It extends SaSaSa2VA by incorporating an existence-aware verification, improving performance on motion-centric referring video object segmentation tasks.
Findings
Achieved 89.19 score in PVUW Challenge, 2nd place.
Existence-aware verification significantly improves motion-centric RVOS performance.
Quantitative results and ablations confirm the effectiveness of the proposed strategy.
Abstract
Referring video object segmentation (RVOS) commonly grounds targets in videos based on static textual cues. MeViS benchmark extends this by incorporating motion-centric expressions (referring & reasoning motion expressions) and introducing no-target queries. Extending SaSaSa2VA, where increased input frames and [SEG] tokens already strengthen the Sa2VA backbone, we adopt a simple yet effective target existence-aware verification mechanism, leading to Still Awesome SaSaSa2VA (SaSaSaSa2VA). Despite its simplicity, the method achieves a final score of 89.19 in the 5th PVUW Challenge (MeViS-Text Track), securing 2nd place. Both quantitative results and ablations suggest that this existence-aware verification strategy is sufficient to unlock strong performance on motion-centric referring tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
