Probing the Multi-turn Planning Capabilities of LLMs via 20 Question   Games

Yizhe Zhang; Jiarui Lu; Navdeep Jaitly

arXiv:2310.01468·cs.CL·February 22, 2024·1 cites

Probing the Multi-turn Planning Capabilities of LLMs via 20 Question Games

Yizhe Zhang, Jiarui Lu, Navdeep Jaitly

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces an entity-deducing game to evaluate and improve the multi-turn reasoning and planning abilities of large language models, demonstrating significant performance differences and potential for enhancement.

Contribution

It proposes a novel evaluation framework for LLMs' conversational reasoning and planning, and explores methods to improve these capabilities through imitation and reinforcement learning.

Findings

01

Strong LLMs like GPT-4 outperform humans in the game.

02

Behavior Cloning enables weaker models to imitate stronger ones.

03

Reinforcement Learning significantly enhances model reasoning and planning.

Abstract

Large language models (LLMs) are effective at answering questions that are clearly asked. However, when faced with ambiguous queries they can act unpredictably and produce incorrect outputs. This underscores the need for the development of intelligent agents capable of asking clarification questions to resolve ambiguities effectively. This capability requires complex understanding, state tracking, reasoning and planning over multiple conversational turns. However, directly measuring this can be challenging. In this paper, we offer a surrogate problem which assesses an LLMs's capability to deduce an entity unknown to itself, but revealed to a judge, by asking the judge a series of queries. This \textit{entity-deducing game} can serve as an evaluation framework to probe the conversational reasoning and planning capabilities of language models. We systematically evaluate various LLMs and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

apple/ml-entity-deduction-arena
pytorchOfficial

Datasets

yizheapple/entity-deduction-arena
dataset· 96 dl
96 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

MethodsMulti-Head Attention · Attention Is All You Need · Dropout · Dense Connections · Linear Layer · Label Smoothing · Adam · Absolute Position Encodings · Residual Connection · Layer Normalization