Does Reasoning Help LLM Agents Play Dungeons and Dragons? A Prompt Engineering Experiment
Patricia Delafuente, Arya Honraopatil, Lara J. Martin

TL;DR
This study investigates whether reasoning enhances LLMs' ability to predict Dungeons & Dragons actions, finding that instruction quality significantly impacts performance and that instruct models suffice for command generation.
Contribution
The paper demonstrates that simple instruction prompts are effective for DnD action prediction and shows that reasoning capabilities are not necessarily required for this task.
Findings
Instruction prompts greatly influence model output quality
Instruct models perform comparably to reasoning models in command generation
Single sentence prompt changes can significantly alter results
Abstract
This paper explores the application of Large Language Models (LLMs) and reasoning to predict Dungeons & Dragons (DnD) player actions and format them as Avrae Discord bot commands. Using the FIREBALL dataset, we evaluated a reasoning model, DeepSeek-R1-Distill-LLaMA-8B, and an instruct model, LLaMA-3.1-8B-Instruct, for command generation. Our findings highlight the importance of providing specific instructions to models, that even single sentence changes in prompts can greatly affect the output of models, and that instruct models are sufficient for this task compared to reasoning models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · AI in Service Interactions
