Probing the Multi-turn Planning Capabilities of LLMs via 20 Question Games
Yizhe Zhang, Jiarui Lu, Navdeep Jaitly

TL;DR
This paper introduces an entity-deducing game to evaluate and improve the multi-turn reasoning and planning abilities of large language models, demonstrating significant performance differences and potential for enhancement.
Contribution
It proposes a novel evaluation framework for LLMs' conversational reasoning and planning, and explores methods to improve these capabilities through imitation and reinforcement learning.
Findings
Strong LLMs like GPT-4 outperform humans in the game.
Behavior Cloning enables weaker models to imitate stronger ones.
Reinforcement Learning significantly enhances model reasoning and planning.
Abstract
Large language models (LLMs) are effective at answering questions that are clearly asked. However, when faced with ambiguous queries they can act unpredictably and produce incorrect outputs. This underscores the need for the development of intelligent agents capable of asking clarification questions to resolve ambiguities effectively. This capability requires complex understanding, state tracking, reasoning and planning over multiple conversational turns. However, directly measuring this can be challenging. In this paper, we offer a surrogate problem which assesses an LLMs's capability to deduce an entity unknown to itself, but revealed to a judge, by asking the judge a series of queries. This \textit{entity-deducing game} can serve as an evaluation framework to probe the conversational reasoning and planning capabilities of language models. We systematically evaluate various LLMs and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsMulti-Head Attention · Attention Is All You Need · Dropout · Dense Connections · Linear Layer · Label Smoothing · Adam · Absolute Position Encodings · Residual Connection · Layer Normalization
