LOCATE: Self-supervised Object Discovery via Flow-guided Graph-cut and Bootstrapped Self-training
Silky Singh, Shripad Deshmukh, Mausoom Sarkar, Balaji, Krishnamurthy

TL;DR
LOCATE is a self-supervised method for object discovery that combines motion and appearance cues via graph cuts and self-training, achieving state-of-the-art results on various benchmarks without human labels.
Contribution
It introduces a novel flow-guided graph cut and bootstrapped self-training framework for unsupervised object segmentation, surpassing previous methods.
Findings
Achieves state-of-the-art performance on multiple benchmarks.
Effectively transfers to in-the-wild images.
Component ablation validates design choices.
Abstract
Learning object segmentation in image and video datasets without human supervision is a challenging problem. Humans easily identify moving salient objects in videos using the gestalt principle of common fate, which suggests that what moves together belongs together. Building upon this idea, we propose a self-supervised object discovery approach that leverages motion and appearance information to produce high-quality object segmentation masks. Specifically, we redesign the traditional graph cut on images to include motion information in a linear combination with appearance information to produce edge weights. Remarkably, this step produces object segmentation masks comparable to the current state-of-the-art on multiple benchmarks. To further improve performance, we bootstrap a segmentation network trained on these preliminary masks as pseudo-ground truths to learn from its own outputs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
