LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning
Cole Robertson, Philip Wolff

TL;DR
This study investigates whether large language models construct internal world models for mechanical reasoning, finding evidence they manipulate internal representations to some extent but may lack detailed structural understanding.
Contribution
The paper introduces cognitive science methods to evaluate LLMs' internal world models, revealing their partial use of internal representations in mechanical reasoning tasks.
Findings
LLMs estimate mechanical advantage slightly above chance
Models can differentiate functional pulley systems from jumbled ones
Models struggle to identify systems with no force transfer, indicating limits in structural reasoning
Abstract
Do large language models (LLMs) construct and manipulate internal world models, or do they rely solely on statistical associations represented as output layer token probabilities? We adapt cognitive science methodologies from human mental models research to test LLMs on pulley system problems using TikZ-rendered stimuli. Study 1 examines whether LLMs can estimate mechanical advantage (MA). State-of-the-art models performed marginally but significantly above chance, and their estimates correlated significantly with ground-truth MA. Significant correlations between number of pulleys and model estimates suggest that models employed a pulley counting heuristic, without necessarily simulating pulley systems to derive precise values. Study 2 tested this by probing whether LLMs represent global features crucial to MA estimation. Models evaluated a functionally connected pulley system against a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
