Optical Flow boosts Unsupervised Localization and Segmentation
Xinyu Zhang, Abdeslam Boularias

TL;DR
This paper introduces a motion-based loss function leveraging optical flow to improve unsupervised object localization and segmentation, outperforming existing methods without requiring labeled data.
Contribution
It proposes a novel optical flow-based loss for fine-tuning vision transformers, enhancing unsupervised segmentation and localization performance.
Findings
Outperforms state-of-the-art unsupervised segmentation methods
Improves object localization accuracy in benchmarks
Enhances ViT features using motion cues
Abstract
Unsupervised localization and segmentation are long-standing robot vision challenges that describe the critical ability for an autonomous robot to learn to decompose images into individual objects without labeled data. These tasks are important because of the limited availability of dense image manual annotation and the promising vision of adapting to an evolving set of object categories in lifelong learning. Most recent methods focus on using visual appearance continuity as object cues by spatially clustering features obtained from self-supervised vision transformers (ViT). In this work, we leverage motion cues, inspired by the common fate principle that pixels that share similar movements tend to belong to the same object. We propose a new loss term formulation that uses optical flow in unlabeled videos to encourage self-supervised ViT features to become closer to each other if their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Robotics and Sensor-Based Localization
MethodsFocus
