Does Reasoning Help LLM Agents Play Dungeons and Dragons? A Prompt Engineering Experiment

Patricia Delafuente; Arya Honraopatil; Lara J. Martin

arXiv:2510.18112·cs.CL·October 22, 2025

Does Reasoning Help LLM Agents Play Dungeons and Dragons? A Prompt Engineering Experiment

Patricia Delafuente, Arya Honraopatil, Lara J. Martin

PDF

Open Access 1 Video

TL;DR

This study investigates whether reasoning enhances LLMs' ability to predict Dungeons & Dragons actions, finding that instruction quality significantly impacts performance and that instruct models suffice for command generation.

Contribution

The paper demonstrates that simple instruction prompts are effective for DnD action prediction and shows that reasoning capabilities are not necessarily required for this task.

Findings

01

Instruction prompts greatly influence model output quality

02

Instruct models perform comparably to reasoning models in command generation

03

Single sentence prompt changes can significantly alter results

Abstract

This paper explores the application of Large Language Models (LLMs) and reasoning to predict Dungeons & Dragons (DnD) player actions and format them as Avrae Discord bot commands. Using the FIREBALL dataset, we evaluated a reasoning model, DeepSeek-R1-Distill-LLaMA-8B, and an instruct model, LLaMA-3.1-8B-Instruct, for command generation. Our findings highlight the importance of providing specific instructions to models, that even single sentence changes in prompts can greatly affect the output of models, and that instruct models are sufficient for this task compared to reasoning models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Does Reasoning Help LLM Agents Play Dungeons and Dragons? A Prompt Engineering Experiment· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · AI in Service Interactions