Self-Supervised Video Object Segmentation via Cutout Prediction and Tagging
Jyoti Kini, Fahad Shahbaz Khan, Salman Khan, Mubarak Shah

TL;DR
This paper introduces a self-supervised video object segmentation method that enhances object-background discriminability using cutout-based reconstruction and tag prediction, achieving state-of-the-art results on challenging benchmarks.
Contribution
It presents a novel discriminative learning loss with cutout and tag prediction terms, and a zoom-in scheme for small object segmentation, improving accuracy over prior self-supervised VOS methods.
Findings
Achieves state-of-the-art results on DAVIS-2017 and Youtube-VOS.
Effective in occlusion scenarios through cutout-based learning.
Improves small object segmentation with multi-scale zoom-in scheme.
Abstract
We propose a novel self-supervised Video Object Segmentation (VOS) approach that strives to achieve better object-background discriminability for accurate object segmentation. Distinct from previous self-supervised VOS methods, our approach is based on a discriminative learning loss formulation that takes into account both object and background information to ensure object-background discriminability, rather than using only object appearance. The discriminative learning loss comprises cutout-based reconstruction (cutout region represents part of a frame, whose pixels are replaced with some constant values) and tag prediction loss terms. The cutout-based reconstruction term utilizes a simple cutout scheme to learn the pixel-wise correspondence between the current and previous frames in order to reconstruct the original current frame with added cutout region in it. The introduced cutout…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
MethodsCutout · VOS
