Can Language Models Serve as Text-Based World Simulators?
Ruoyao Wang, Graham Todd, Ziang Xiao, Xingdi Yuan, Marc-Alexandre, C\^ot\'e, Peter Clark, Peter Jansen

TL;DR
This paper investigates whether large language models like GPT-4 can act as text-based world simulators by predicting state changes in text-based environments, introducing a new benchmark dataset for evaluation.
Contribution
It introduces ByteSized32-State-Prediction, a novel benchmark dataset for evaluating LLMs as text-based world simulators, and assesses GPT-4's performance on this task.
Findings
GPT-4 performs well but remains unreliable as a world simulator
The benchmark reveals current LLM limitations in state prediction accuracy
Provides a new tool for future research in LLM-based simulation
Abstract
Virtual environments play a key role in benchmarking advances in complex planning and decision-making tasks but are expensive and complicated to build by hand. Can current language models themselves serve as world simulators, correctly predicting how actions change different world states, thus bypassing the need for extensive manual coding? Our goal is to answer this question in the context of text-based simulators. Our approach is to build and use a new benchmark, called ByteSized32-State-Prediction, containing a dataset of text game state transitions and accompanying game tasks. We use this to directly quantify, for the first time, how well LLMs can serve as text-based world simulators. We test GPT-4 on this dataset and find that, despite its impressive performance, it is still an unreliable world simulator without further innovations. This work thus contributes both new insights into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multi-Agent Systems and Negotiation
MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Multi-Head Attention · Position-Wise Feed-Forward Layer
