Discovering and using Spelke segments

Rahul Venkatesh; Klemen Kotar; Lilian Naing Chen; Seungwoo Kim; Luca Thomas Wheeler; Jared Watrous; Ashley Xu; Gia Ancone; Wanhee Lee; Honglin Chen; Daniel Bear; Stefan Stojanov; Daniel Yamins

arXiv:2507.16038·cs.CV·July 23, 2025

Discovering and using Spelke segments

Rahul Venkatesh, Klemen Kotar, Lilian Naing Chen, Seungwoo Kim, Luca Thomas Wheeler, Jared Watrous, Ashley Xu, Gia Ancone, Wanhee Lee, Honglin Chen, Daniel Bear, Stefan Stojanov, Daniel Yamins

PDF

Open Access

TL;DR

This paper introduces a new concept of Spelke segments based on physical motion relationships, benchmarks it with a new dataset, and develops SpelkeNet to extract these segments, improving performance in manipulation tasks.

Contribution

The paper defines Spelke segments in vision, creates SpelkeBench dataset, and develops SpelkeNet for motion-based segmentation, advancing category-agnostic object understanding.

Findings

01

SpelkeNet outperforms SegmentAnything on SpelkeBench.

02

Spelke segments improve object manipulation performance.

03

The approach enables category-agnostic segmentation based on physical motion.

Abstract

Segments in computer vision are often defined by semantic considerations and are highly dependent on category-specific conventions. In contrast, developmental psychology suggests that humans perceive the world in terms of Spelke objects--groupings of physical things that reliably move together when acted on by physical forces. Spelke objects thus operate on category-agnostic causal motion relationships which potentially better support tasks like manipulation and planning. In this paper, we first benchmark the Spelke object concept, introducing the SpelkeBench dataset that contains a wide variety of well-defined Spelke segments in natural images. Next, to extract Spelke segments from images algorithmically, we build SpelkeNet, a class of visual world models trained to predict distributions over future motions. SpelkeNet supports estimation of two key concepts for Spelke object discovery:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Spatial Cognition and Navigation