GameEval: Evaluating LLMs on Conversational Games
Dan Qiao, Chenfei Wu, Yaobo Liang, Juntao Li, Nan Duan

TL;DR
GameEval introduces a goal-driven conversational game framework to evaluate large language models, addressing limitations of existing methods by providing a comprehensive, role-based assessment of LLM capabilities through diverse interactive scenarios.
Contribution
The paper presents a novel evaluation paradigm using conversational games with specific roles and objectives, offering a more holistic assessment of LLM performance.
Findings
Effectively differentiates capabilities of various LLMs
Provides comprehensive assessment of complex problem-solving abilities
Demonstrates robustness across multiple game scenarios
Abstract
The rapid advancements in large language models (LLMs) have presented challenges in evaluating those models. Existing evaluation methods are either reference-based or preference based, which inevitably need human intervention or introduce test bias caused by evaluator models. In this paper, we propose GameEval, a novel approach to evaluating LLMs through goal-driven conversational games, overcoming the limitations of previous methods. GameEval treats LLMs as game players and assigns them distinct roles with specific goals achieved by launching conversations of various forms, including discussion, question answering, and voting. We design three unique games with cooperative or adversarial objectives, accompanied by corresponding evaluation metrics, to show how this new paradigm comprehensively evaluates model performance.Through extensive experiments, we show that GameEval can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
