GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning

Qingchen Yu; Zifan Zheng; Ding Chen; Simin Niu; Bo Tang; Feiyu Xiong; Zhiyu Li

arXiv:2505.22661·cs.CL·May 29, 2025

GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning

Qingchen Yu, Zifan Zheng, Ding Chen, Simin Niu, Bo Tang, Feiyu Xiong, Zhiyu Li

PDF

Open Access 1 Video

TL;DR

GuessArena is an adaptive, game-inspired framework for evaluating large language models' domain-specific knowledge and reasoning abilities, addressing limitations of static benchmarks by offering dynamic, interpretable, and scalable assessments.

Contribution

It introduces a novel adversarial game-based evaluation framework that dynamically assesses LLMs' domain knowledge and reasoning, improving over static benchmarks.

Findings

01

Effectively distinguishes LLMs in five domains

02

Enhances interpretability and scalability of evaluations

03

Demonstrates superior scenario adaptability

Abstract

The evaluation of large language models (LLMs) has traditionally relied on static benchmarks, a paradigm that poses two major limitations: (1) predefined test sets lack adaptability to diverse application domains, and (2) standardized evaluation protocols often fail to capture fine-grained assessments of domain-specific knowledge and contextual reasoning abilities. To overcome these challenges, we propose GuessArena, an adaptive evaluation framework grounded in adversarial game-based interactions. Inspired by the interactive structure of the Guess Who I Am? game, our framework seamlessly integrates dynamic domain knowledge modeling with progressive reasoning assessment to improve evaluation fidelity. Empirical studies across five vertical domains-finance, healthcare, manufacturing, information technology, and education-demonstrate that GuessArena effectively distinguishes LLMs in terms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning· underline

Taxonomy

TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Artificial Intelligence in Law