Playing With AI: How Do State-Of-The-Art Large Language Models Perform in the 1977 Text-Based Adventure Game Zork?
Berry Gerrits

TL;DR
This study evaluates current large language models' ability to solve the classic text-based game Zork, revealing significant limitations in their reasoning, problem-solving, and metacognitive skills despite advanced capabilities.
Contribution
It provides a systematic assessment of leading LLMs in a complex, natural language environment, highlighting their shortcomings in reasoning and strategic persistence.
Findings
Models achieve less than 10% game completion on average.
Providing detailed instructions does not improve performance.
Models show fundamental reasoning limitations and inability to learn from past attempts.
Abstract
In this positioning paper, we evaluate the problem-solving and reasoning capabilities of contemporary Large Language Models (LLMs) through their performance in Zork, the seminal text-based adventure game first released in 1977. The game's dialogue-based structure provides a controlled environment for assessing how LLM-based chatbots interpret natural language descriptions and generate appropriate action sequences to succeed in the game. We test the performance of leading proprietary models - ChatGPT, Claude, and Gemini - under both minimal and detailed instructions, measuring game progress through achieved scores as the primary metric. Our results reveal that all tested models achieve less than 10% completion on average, with even the best-performing model (Claude Opus 4.5) reaching only approximately 75 out of 350 possible points. Notably, providing detailed game instructions offers no…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in Service Interactions · Artificial Intelligence in Games · Artificial Intelligence in Healthcare and Education
