Do Large Language Models have Problem-Solving Capability under   Incomplete Information Scenarios?

Yuyan Chen; Tianhao Yu; Yueze Li; Songzhou Yan; Sijia Liu; Jiaqing; Liang; Yanghua Xiao

arXiv:2409.14762·cs.CL·September 24, 2024

Do Large Language Models have Problem-Solving Capability under Incomplete Information Scenarios?

Yuyan Chen, Tianhao Yu, Yueze Li, Songzhou Yan, Sijia Liu, Jiaqing, Liang, Yanghua Xiao

PDF

Open Access

TL;DR

This paper introduces BrainKing, a novel game designed to evaluate large language models' problem-solving abilities under incomplete information, highlighting their strengths and limitations through varying difficulty levels.

Contribution

The paper presents a new evaluation game, BrainKing, that better assesses LLMs' problem-solving skills in incomplete information scenarios involving misleading cues.

Findings

01

LLMs show varying performance across difficulty levels.

02

The evaluation reveals specific strengths and limitations of LLMs.

03

BrainKing provides a more realistic assessment of problem-solving capabilities.

Abstract

The evaluation of the problem-solving capability under incomplete information scenarios of Large Language Models (LLMs) is increasingly important, encompassing capabilities such as questioning, knowledge search, error detection, and path planning. Current research mainly focus on LLMs' problem-solving capability such as ``Twenty Questions''. However, these kinds of games do not require recognizing misleading cues which are necessary in the incomplete information scenario. Moreover, the existing game such as ``Who is undercover'' are highly subjective, making it challenging for evaluation. Therefore, in this paper, we introduce a novel game named BrainKing based on the ``Who is undercover'' and ``Twenty Questions'' for evaluating LLM capabilities under incomplete information scenarios. It requires LLMs to identify target entities with limited yes-or-no questions and potential misleading…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsFocus