DyStaB: Unsupervised Object Segmentation via Dynamic-Static Bootstrapping
Yanchao Yang, Brian Lai, Stefano Soatto

TL;DR
DyStaB introduces an unsupervised approach for object segmentation that leverages motion cues and deep learning, enabling detection in static images and videos without manual annotations, and supports continual learning of new objects.
Contribution
The paper presents a novel bootstrapping method combining motion segmentation and deep neural networks for unsupervised object detection and segmentation, capable of generalizing to unseen objects.
Findings
Outperforms supervised methods on benchmark datasets.
Effective in static images and videos without manual labels.
Supports continual learning of new objects.
Abstract
We describe an unsupervised method to detect and segment portions of images of live scenes that, at some point in time, are seen moving as a coherent whole, which we refer to as objects. Our method first partitions the motion field by minimizing the mutual information between segments. Then, it uses the segments to learn object models that can be used for detection in a static image. Static and dynamic models are represented by deep neural networks trained jointly in a bootstrapping strategy, which enables extrapolation to previously unseen objects. While the training process requires motion, the resulting object segmentation network can be used on either static images or videos at inference time. As the volume of seen videos grows, more and more objects are seen moving, priming their detection, which then serves as a regularizer for new objects, turning our method into unsupervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
