ClarQ-LLM: A Benchmark for Models Clarifying and Requesting Information in Task-Oriented Dialog
Yujian Gan, Changling Li, Jinxia Xie, Luou Wen, Matthew Purver,, Massimo Poesio

TL;DR
ClarQ-LLM is a comprehensive bilingual benchmark for evaluating task-oriented dialogue agents' ability to ask clarification questions, featuring diverse scenarios and a provider agent to simulate real interactions, challenging current models.
Contribution
The paper introduces ClarQ-LLM, a novel bilingual benchmark with diverse scenarios and a provider agent, to evaluate and advance models' clarification question capabilities in dialogues.
Findings
LLAMA3.1 405B seeker achieved 60.05% success rate.
The benchmark covers 31 task types with 10 scenarios each.
ClarQ-LLM poses a significant challenge for current dialogue models.
Abstract
We introduce ClarQ-LLM, an evaluation framework consisting of bilingual English-Chinese conversation tasks, conversational agents and evaluation metrics, designed to serve as a strong benchmark for assessing agents' ability to ask clarification questions in task-oriented dialogues. The benchmark includes 31 different task types, each with 10 unique dialogue scenarios between information seeker and provider agents. The scenarios require the seeker to ask questions to resolve uncertainty and gather necessary information to complete tasks. Unlike traditional benchmarks that evaluate agents based on fixed dialogue content, ClarQ-LLM includes a provider conversational agent to replicate the original human provider in the benchmark. This allows both current and future seeker agents to test their ability to complete information gathering tasks through dialogue by directly interacting with our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Semantic Web and Ontologies
