Re-Prompting SAM 3 via Object Retrieval: 3rd of the 5th PVUW MOSE Track
Mingqi Gao, Sijie Li, Jungong Han

TL;DR
This paper presents a re-prompting framework for SAM 3 that enhances semi-supervised video object segmentation by using object retrieval and multi-anchor propagation, significantly improving robustness in challenging scenarios.
Contribution
We introduce an automatic re-prompting method leveraging object retrieval and multi-anchor propagation to improve SAM 3's performance in complex video segmentation tasks.
Findings
Achieved a J&F score of 51.17% on MOSEv2 test set
Ranked 3rd in the MOSEv2 track
Enhanced robustness against target disappearance and distractors
Abstract
This technical report explores the MOSEv2 track of the PVUW 2026 Challenge, which targets complex semi-supervised video object segmentation. Built on SAM~3, we develop an automatic re-prompting framework to improve robustness under target disappearance and reappearance, severe transformation, and strong same-category distractors. Our method first applies the SAM~3 detector to later frames to identify same-category object candidates, and then performs DINOv3-based object-level matching with a transformation-aware target feature pool to retrieve reliable target anchors. These anchors are injected back into the SAM~3 tracker together with the first-frame mask, enabling multi-anchor propagation rather than relying solely on the initial prompt. This simple directly benefits several core challenges of MOSEv2. Our solution achieves a J&F of 51.17% on the test set, ranking 3rd in the MOSEv2…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Multimodal Machine Learning Applications
