Re-Prompting SAM 3 via Object Retrieval: 3rd of the 5th PVUW MOSE Track

Mingqi Gao; Sijie Li; Jungong Han

arXiv:2603.23788·cs.CV·March 26, 2026

Re-Prompting SAM 3 via Object Retrieval: 3rd of the 5th PVUW MOSE Track

Mingqi Gao, Sijie Li, Jungong Han

PDF

Open Access

TL;DR

This paper presents a re-prompting framework for SAM 3 that enhances semi-supervised video object segmentation by using object retrieval and multi-anchor propagation, significantly improving robustness in challenging scenarios.

Contribution

We introduce an automatic re-prompting method leveraging object retrieval and multi-anchor propagation to improve SAM 3's performance in complex video segmentation tasks.

Findings

01

Achieved a J&F score of 51.17% on MOSEv2 test set

02

Ranked 3rd in the MOSEv2 track

03

Enhanced robustness against target disappearance and distractors

Abstract

This technical report explores the MOSEv2 track of the PVUW 2026 Challenge, which targets complex semi-supervised video object segmentation. Built on SAM~3, we develop an automatic re-prompting framework to improve robustness under target disappearance and reappearance, severe transformation, and strong same-category distractors. Our method first applies the SAM~3 detector to later frames to identify same-category object candidates, and then performs DINOv3-based object-level matching with a transformation-aware target feature pool to retrieve reliable target anchors. These anchors are injected back into the SAM~3 tracker together with the first-frame mask, enabling multi-anchor propagation rather than relying solely on the initial prompt. This simple directly benefits several core challenges of MOSEv2. Our solution achieves a J&F of 51.17% on the test set, ranking 3rd in the MOSEv2…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Multimodal Machine Learning Applications