Towards aligned body representations in vision models

Andrey Gizdov; Andrea Procopio; Yichen Li; Daniel Harari; Tomer Ullman

arXiv:2512.00365·cs.CV·December 2, 2025

Towards aligned body representations in vision models

Andrey Gizdov, Andrea Procopio, Yichen Li, Daniel Harari, Tomer Ullman

PDF

Open Access 1 Video

TL;DR

This paper investigates whether vision models trained for segmentation develop internal coarse body representations similar to humans, finding that smaller models naturally form human-like coarse representations, while larger models tend to encode finer details.

Contribution

It demonstrates that coarse, human-like body representations can emerge in vision models under limited computational resources, providing insights into physical reasoning.

Findings

01

Smaller models develop human-like coarse body representations.

02

Larger models tend toward detailed, fine-grain encodings.

03

Coarse representations can emerge with limited computational resources.

Abstract

Human physical reasoning relies on internal "body" representations - coarse, volumetric approximations that capture an object's extent and support intuitive predictions about motion and physics. While psychophysical evidence suggests humans use such coarse representations, their internal structure remains largely unknown. Here we test whether vision models trained for segmentation develop comparable representations. We adapt a psychophysical experiment conducted with 50 human participants to a semantic segmentation task and test a family of seven segmentation networks, varying in size. We find that smaller models naturally form human-like coarse body representations, whereas larger models tend toward overly detailed, fine-grain encodings. Our results demonstrate that coarse representations can emerge under limited computational resources, and that machine representations can provide a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Towards Aligned Body Representations in Vision Models· underline

Taxonomy

TopicsAction Observation and Synchronization · Face Recognition and Perception · Embodied and Extended Cognition