TowerMind: A Tower Defence Game Learning Environment and Benchmark for LLM as Agents
Dawei Wang, Chengming Zhou, Di Zhao, Xinyuan Liu, Marci Chi Ma, Gary Ushaw, Richard Davison

TL;DR
TowerMind introduces a low-computational, multimodal tower defense environment for evaluating large language models' strategic planning, decision-making, and hallucination, filling a gap in RTS game-based benchmarking.
Contribution
It presents a novel, lightweight RTS environment supporting multimodal observations for LLM evaluation, along with benchmark levels and analysis of LLM performance and limitations.
Findings
LLMs lag behind human experts in strategic planning and hallucination detection.
LLMs show limitations in decision-making diversity and action efficiency.
Classic reinforcement learning algorithms perform variably in the TowerMind environment.
Abstract
Recent breakthroughs in Large Language Models (LLMs) have positioned them as a promising paradigm for agents, with long-term planning and decision-making emerging as core general-purpose capabilities for adapting to diverse scenarios and tasks. Real-time strategy (RTS) games serve as an ideal testbed for evaluating these two capabilities, as their inherent gameplay requires both macro-level strategic planning and micro-level tactical adaptation and action execution. Existing RTS game-based environments either suffer from relatively high computational demands or lack support for textual observations, which has constrained the use of RTS games for LLM evaluation. Motivated by this, we present TowerMind, a novel environment grounded in the tower defense (TD) subgenre of RTS games. TowerMind preserves the key evaluation strengths of RTS games for assessing LLMs, while featuring low…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsArtificial Intelligence in Games · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications
