A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environment
Matteo G. Mecattaf, Ben Slater, Marko Te\v{s}i\'c, Jonathan Prunty,, Konstantinos Voudouris, Lucy G. Cheke

TL;DR
This paper introduces a novel framework for evaluating large language models' physical common-sense reasoning within a 3D embodied environment, revealing current limitations compared to humans and animals.
Contribution
It pioneers the first embodied, cognitively meaningful assessment of LLMs in a 3D environment, enabling direct comparison with other embodied agents and humans.
Findings
LLMs can perform physical reasoning tasks without finetuning.
Humans outperform LLMs on physical reasoning in a 3D environment.
The framework enables ecologically valid experiments from cognitive science.
Abstract
As general-purpose tools, Large Language Models (LLMs) must often reason about everyday physical environments. In a question-and-answer capacity, understanding the interactions of physical objects may be necessary to give appropriate responses. Moreover, LLMs are increasingly used as reasoning engines in agentic systems, designing and controlling their action sequences. The vast majority of research has tackled this issue using static benchmarks, comprised of text or image-based questions about the physical world. However, these benchmarks do not capture the complexity and nuance of real-life physical processes. Here we advocate for a second, relatively unexplored, approach: 'embodying' the LLMs by granting them control of an agent within a 3D environment. We present the first embodied and cognitively meaningful evaluation of physical common-sense reasoning in LLMs. Our framework allows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVirtual Reality Applications and Impacts
