ClarQ-LLM: A Benchmark for Models Clarifying and Requesting Information   in Task-Oriented Dialog

Yujian Gan; Changling Li; Jinxia Xie; Luou Wen; Matthew Purver,; Massimo Poesio

arXiv:2409.06097·cs.CL·September 17, 2024

ClarQ-LLM: A Benchmark for Models Clarifying and Requesting Information in Task-Oriented Dialog

Yujian Gan, Changling Li, Jinxia Xie, Luou Wen, Matthew Purver,, Massimo Poesio

PDF

Open Access 1 Repo

TL;DR

ClarQ-LLM is a comprehensive bilingual benchmark for evaluating task-oriented dialogue agents' ability to ask clarification questions, featuring diverse scenarios and a provider agent to simulate real interactions, challenging current models.

Contribution

The paper introduces ClarQ-LLM, a novel bilingual benchmark with diverse scenarios and a provider agent, to evaluate and advance models' clarification question capabilities in dialogues.

Findings

01

LLAMA3.1 405B seeker achieved 60.05% success rate.

02

The benchmark covers 31 task types with 10 scenarios each.

03

ClarQ-LLM poses a significant challenge for current dialogue models.

Abstract

We introduce ClarQ-LLM, an evaluation framework consisting of bilingual English-Chinese conversation tasks, conversational agents and evaluation metrics, designed to serve as a strong benchmark for assessing agents' ability to ask clarification questions in task-oriented dialogues. The benchmark includes 31 different task types, each with 10 unique dialogue scenarios between information seeker and provider agents. The scenarios require the seeker to ask questions to resolve uncertainty and gather necessary information to complete tasks. Unlike traditional benchmarks that evaluate agents based on fixed dialogue content, ClarQ-LLM includes a provider conversational agent to replicate the original human provider in the benchmark. This allows both current and future seeker agents to test their ability to complete information gathering tasks through dialogue by directly interacting with our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ygan/clarq-llm
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Semantic Web and Ontologies