Do Large Language Models have Problem-Solving Capability under Incomplete Information Scenarios?
Yuyan Chen, Tianhao Yu, Yueze Li, Songzhou Yan, Sijia Liu, Jiaqing, Liang, Yanghua Xiao

TL;DR
This paper introduces BrainKing, a novel game designed to evaluate large language models' problem-solving abilities under incomplete information, highlighting their strengths and limitations through varying difficulty levels.
Contribution
The paper presents a new evaluation game, BrainKing, that better assesses LLMs' problem-solving skills in incomplete information scenarios involving misleading cues.
Findings
LLMs show varying performance across difficulty levels.
The evaluation reveals specific strengths and limitations of LLMs.
BrainKing provides a more realistic assessment of problem-solving capabilities.
Abstract
The evaluation of the problem-solving capability under incomplete information scenarios of Large Language Models (LLMs) is increasingly important, encompassing capabilities such as questioning, knowledge search, error detection, and path planning. Current research mainly focus on LLMs' problem-solving capability such as ``Twenty Questions''. However, these kinds of games do not require recognizing misleading cues which are necessary in the incomplete information scenario. Moreover, the existing game such as ``Who is undercover'' are highly subjective, making it challenging for evaluation. Therefore, in this paper, we introduce a novel game named BrainKing based on the ``Who is undercover'' and ``Twenty Questions'' for evaluating LLM capabilities under incomplete information scenarios. It requires LLMs to identify target entities with limited yes-or-no questions and potential misleading…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsFocus
