Learning Video Object Segmentation from Unlabeled Videos
Xiankai Lu, Wenguan Wang, Jianbing Shen, Yu-Wing Tai, David Crandall,, and Steven C. H. Hoi

TL;DR
This paper introduces MuG, a unified unsupervised/weakly supervised framework for video object segmentation that learns from unlabeled videos, reducing annotation needs and improving performance across various VOS settings.
Contribution
The paper presents MuG, a novel framework that enables effective VOS learning from unlabeled data, advancing understanding of visual patterns and reducing reliance on annotations.
Findings
Promising results in zero-shot and one-shot VOS settings.
Effective learning from unlabeled videos.
Potential to improve segmentation accuracy with unlabeled data.
Abstract
We propose a new method for video object segmentation (VOS) that addresses object pattern learning from unlabeled videos, unlike most existing methods which rely heavily on extensive annotated data. We introduce a unified unsupervised/weakly supervised learning framework, called MuG, that comprehensively captures intrinsic properties of VOS at multiple granularities. Our approach can help advance understanding of visual patterns in VOS and significantly reduce annotation burden. With a carefully-designed architecture and strong representation learning ability, our learned model can be applied to diverse VOS settings, including object-level zero-shot VOS, instance-level zero-shot VOS, and one-shot VOS. Experiments demonstrate promising performance in these settings, as well as the potential of MuG in leveraging unlabeled data to further improve the segmentation accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Learning Video Object Segmentation From Unlabeled Videos· youtube
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
