Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models
Tian Liang, Zhiwei He, Jen-tse Huang, Wenxuan Wang and, Wenxiang Jiao, Rui Wang, Yujiu Yang, Zhaopeng Tu, Shuming Shi and, Xing Wang

TL;DR
This paper introduces a novel, game-based framework using word guessing games to evaluate large language models' intelligence, emphasizing strategic communication, adaptability, and multi-agent interaction, offering a cost-effective alternative to traditional datasets.
Contribution
It proposes DEEP and SpyGame frameworks that assess LLMs' expression, disguising, and strategic skills through interactive, multi-agent language games, advancing evaluation methods for AI intelligence.
Findings
DEEP effectively measures LLMs' descriptive and disguising abilities.
SpyGame captures LLMs' strategic thinking and adaptability.
Framework is easy to implement across multiple languages and domains.
Abstract
The automatic evaluation of LLM-based agent intelligence is critical in developing advanced LLM-based agents. Although considerable effort has been devoted to developing human-annotated evaluation datasets, such as AlpacaEval, existing techniques are costly, time-consuming, and lack adaptability. In this paper, inspired by the popular language game ``Who is Spy'', we propose to use the word guessing game to assess the intelligence performance of LLMs. Given a word, the LLM is asked to describe the word and determine its identity (spy or not) based on its and other players' descriptions. Ideally, an advanced agent should possess the ability to accurately describe a given word using an aggressive description while concurrently maximizing confusion in the conservative description, enhancing its participation in the game. To this end, we first develop DEEP to evaluate LLMs' expression and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
