GameTalk: Training LLMs for Strategic Conversation
Victor Conchello Vendrell, Max Ruiz Luyten, Mihaela van der Schaar

TL;DR
GameTalk introduces a framework for training large language models to make strategic decisions through multi-turn conversations, optimizing long-term objectives in complex multi-agent environments.
Contribution
We develop a novel training approach that adapts fine-tuning methods to optimize global conversation-level objectives in multi-agent settings.
Findings
GameTalk significantly outperforms untrained models in complex strategic tasks.
Reward shaping enhances model performance in multi-turn interactions.
DPO yields the strongest improvements among tested methods.
Abstract
Strategic decision-making in multi-agent settings is a key challenge for large language models (LLMs), particularly when coordination and negotiation must unfold over extended conversations. While recent work has explored the use of LLMs in isolated decision tasks, little attention has been given to optimizing long-term objectives through dialogue. We introduce \textbf{GameTalk}, a framework for training LLMs to make strategic decisions via multi-turn interactions. Unlike prior work that focuses on single-turn objectives or static action prediction, we train LLMs to optimize a global objective across full conversations. We achieve this by adapting fine-tuning methods like GRPO, DPO, and STaR to incorporate reward signals that depend on the entire interaction. We evaluate this approach on a suite of increasingly complex games, designed to stress different aspects of reasoning,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Speech and dialogue systems · Topic Modeling
