Segment Anything Across Shots: A Method and Benchmark

Hengrui Hu; Kaining Ying; Henghui Ding

arXiv:2511.13715·cs.CV·November 18, 2025

Segment Anything Across Shots: A Method and Benchmark

Hengrui Hu, Kaining Ying, Henghui Ding

PDF

Open Access 1 Video

TL;DR

This paper introduces a new method and benchmark for multi-shot semi-supervised video object segmentation, addressing shot discontinuities with a novel data augmentation strategy and a model that effectively detects and segments across shot transitions.

Contribution

The paper proposes the TMA data augmentation strategy and the SAAS model for improved cross-shot generalization in MVOS, along with the new Cut-VOS benchmark for evaluation.

Findings

01

SAAS achieves state-of-the-art performance on YouMVOS and Cut-VOS datasets.

02

TMA enhances cross-shot generalization with limited single-shot data.

03

Cut-VOS provides a diverse and challenging benchmark for MVOS.

Abstract

This work focuses on multi-shot semi-supervised video object segmentation (MVOS), which aims at segmenting the target object indicated by an initial mask throughout a video with multiple shots. The existing VOS methods mainly focus on single-shot videos and struggle with shot discontinuities, thereby limiting their real-world applicability. We propose a transition mimicking data augmentation strategy (TMA) which enables cross-shot generalization with single-shot data to alleviate the severe annotated multi-shot data sparsity, and the Segment Anything Across Shots (SAAS) model, which can detect and comprehend shot transitions effectively. To support evaluation and future study in MVOS, we introduce Cut-VOS, a new MVOS benchmark with dense mask annotations, diverse object categories, and high-frequency transitions. Extensive experiments on YouMVOS and Cut-VOS demonstrate that the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Segment Anything Across Shots: A Method and Benchmark· underline

Taxonomy

TopicsVisual Attention and Saliency Detection · Video Analysis and Summarization · Human Pose and Action Recognition