Per-Clip Video Object Segmentation
Kwanyong Park, Sanghyun Woo, Seoung Wug Oh, In So Kweon, Joon-Young, Lee

TL;DR
This paper introduces a clip-wise approach to video object segmentation that enhances accuracy and efficiency by processing multiple frames simultaneously and updating memory less frequently, achieving state-of-the-art results.
Contribution
It proposes a novel per-clip inference scheme with clip-wise feature refinement and progressive matching, improving segmentation performance and computational efficiency.
Findings
Achieves state-of-the-art performance on multiple benchmarks.
Demonstrates significant speed-accuracy trade-offs with different memory update intervals.
Provides a flexible framework adaptable to various accuracy and efficiency needs.
Abstract
Recently, memory-based approaches show promising results on semi-supervised video object segmentation. These methods predict object masks frame-by-frame with the help of frequently updated memory of the previous mask. Different from this per-frame inference, we investigate an alternative perspective by treating video object segmentation as clip-wise mask propagation. In this per-clip inference scheme, we update the memory with an interval and simultaneously process a set of consecutive frames (i.e. clip) between the memory updates. The scheme provides two potential benefits: accuracy gain by clip-level optimization and efficiency gain by parallel computation of multiple frames. To this end, we propose a new method tailored for the per-clip inference. Specifically, we first introduce a clip-wise operation to refine the features based on intra-clip correlation. In addition, we employ a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Image Enhancement Techniques
