Rethinking the Simulation vs. Rendering Dichotomy: No Free Lunch in Spatial World Modelling
Dezhi Luo, Qingying Gao, Hokin Deng

TL;DR
This paper challenges the traditional simulation versus rendering view in spatial world models, emphasizing the importance of perceptual content for reasoning and proposing architectures that maintain structured perceptual representations.
Contribution
It introduces a new perspective linking perceptual content with spatial reasoning, supported by evidence from neuroscience and embodied AI developments.
Findings
Fine-grained perceptual content is essential for spatial reasoning.
Shared representational geometries underpin simulation and perception.
Rich perceptual details enhance AI performance in physics-based tasks.
Abstract
Spatial world models, representations that support flexible reasoning about spatial relations, are central to developing computational models that could operate in the physical world, but their precise mechanistic underpinnings are nuanced by the borrowing of underspecified or misguided accounts of human cognition. This paper revisits the simulation versus rendering dichotomy and draws on evidence from aphantasia to argue that fine-grained perceptual content is critical for model-based spatial reasoning. Drawing on recent research into the neural basis of visual awareness, we propose that spatial simulation and perceptual experience depend on shared representational geometries captured by higher-order indices of perceptual relations. We argue that recent developments in embodied AI support this claim, where rich perceptual details improve performance on physics-based world engagements.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAction Observation and Synchronization · Spatial Cognition and Navigation · Face Recognition and Perception
