Baba is LLM: Reasoning in a Game with Dynamic Rules
Fien van Wetten, Aske Plaat, Max van Duijn

TL;DR
This paper evaluates large language models' ability to reason and solve puzzles in the game Baba is You, focusing on dynamic rule manipulation and the models' understanding of game mechanics and reasoning capabilities.
Contribution
It introduces a novel evaluation of LLMs on a complex reasoning task involving dynamic rules, highlighting the limitations and potential of current models.
Findings
Larger models like GPT-4o perform better in reasoning tasks.
Finetuning improves level analysis but not solution formulation.
All models struggle with understanding dynamic rule changes.
Abstract
Large language models (LLMs) are known to perform well on language tasks, but struggle with reasoning tasks. This paper explores the ability of LLMs to play the 2D puzzle game Baba is You, in which players manipulate rules by rearranging text blocks that define object properties. Given that this rule-manipulation relies on language abilities and reasoning, it is a compelling challenge for LLMs. Six LLMs are evaluated using different prompt types, including (1) simple, (2) rule-extended and (3) action-extended prompts. In addition, two models (Mistral, OLMo) are finetuned using textual and structural data from the game. Results show that while larger models (particularly GPT-4o) perform better in reasoning and puzzle solving, smaller unadapted models struggle to recognize game mechanics or apply rule changes. Finetuning improves the ability to analyze the game levels, but does not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLogic, Reasoning, and Knowledge · Multi-Agent Systems and Negotiation
