CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information   Needs in Large Language Models

Tong Zhang; Peixin Qin; Yang Deng; Chen Huang; Wenqiang Lei; Junhong; Liu; Dingnan Jin; Hongru Liang; Tat-Seng Chua

arXiv:2405.12063·cs.CL·June 4, 2024·2 cites

CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models

Tong Zhang, Peixin Qin, Yang Deng, Chen Huang, Wenqiang Lei, Junhong, Liu, Dingnan Jin, Hongru Liang, Tat-Seng Chua

PDF

Open Access 1 Repo

TL;DR

CLAMBER introduces a comprehensive benchmark with a large dataset to evaluate and improve large language models' ability to identify and clarify ambiguous user queries, highlighting current limitations and guiding future research.

Contribution

This paper presents CLAMBER, a new benchmark and dataset for assessing LLMs' performance on ambiguity detection and clarification, addressing a gap in existing evaluation methods.

Findings

01

Current LLMs have limited ability to identify ambiguity in queries.

02

Chain-of-thought and few-shot prompting offer minimal improvements in ambiguity detection.

03

LLMs struggle to generate high-quality clarifying questions due to conflict resolution issues.

Abstract

Large language models (LLMs) are increasingly used to meet user information needs, but their effectiveness in dealing with user queries that contain various types of ambiguity remains unknown, ultimately risking user trust and satisfaction. To this end, we introduce CLAMBER, a benchmark for evaluating LLMs using a well-organized taxonomy. Building upon the taxonomy, we construct ~12K high-quality data to assess the strengths, weaknesses, and potential risks of various off-the-shelf LLMs. Our findings indicate the limited practical utility of current LLMs in identifying and clarifying ambiguous user queries, even enhanced by chain-of-thought (CoT) and few-shot prompting. These techniques may result in overconfidence in LLMs and yield only marginal enhancements in identifying ambiguity. Furthermore, current LLMs fall short in generating high-quality clarifying questions due to a lack of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zt991211/clamber
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Data Quality and Management