Object Concepts Emerge from Motion

Haoqian Liang; Xiaohui Wang; Zhichao Li; Ya Yang; Naiyan Wang

arXiv:2505.21635·cs.CV·May 29, 2025

Object Concepts Emerge from Motion

Haoqian Liang, Xiaohui Wang, Zhichao Li, Ya Yang, Naiyan Wang

PDF

Open Access

TL;DR

This paper introduces a biologically inspired, unsupervised framework that leverages motion boundaries in videos to learn object-centric visual representations, outperforming existing methods across multiple vision tasks.

Contribution

It proposes a novel, label-free approach using motion cues for object representation learning, scalable to large unstructured video data.

Findings

01

Outperforms previous supervised and self-supervised methods

02

Demonstrates strong generalization to unseen scenes

03

Effective across both low-level and high-level vision tasks

Abstract

Object concepts play a foundational role in human visual cognition, enabling perception, memory, and interaction in the physical world. Inspired by findings in developmental neuroscience - where infants are shown to acquire object understanding through observation of motion - we propose a biologically inspired framework for learning object-centric visual representations in an unsupervised manner. Our key insight is that motion boundary serves as a strong signal for object-level grouping, which can be used to derive pseudo instance supervision from raw videos. Concretely, we generate motion-based instance masks using off-the-shelf optical flow and clustering algorithms, and use them to train visual encoders via contrastive learning. Our framework is fully label-free and does not rely on camera calibration, making it scalable to large-scale unstructured video data. We evaluate our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Child and Animal Learning Development · Face Recognition and Perception