The Task-oriented Queries Benchmark (ToQB)
Keun Soo Yim

TL;DR
The paper introduces ToQB, a new benchmark for task-oriented queries, created using an innovative methodology that leverages existing dialogue datasets and LLMs, to evaluate virtual assistants and chatbots.
Contribution
It presents a novel, automated approach for generating a comprehensive task-oriented queries benchmark from dialogue datasets using LLMs, filling a gap in NLP evaluation tools.
Findings
Successfully generated ToQB dataset across three domains.
Demonstrated customization of LLM prompts for different domains.
Provided a framework for community-driven expansion of ToQB.
Abstract
Task-oriented queries (e.g., one-shot queries to play videos, order food, or call a taxi) are crucial for assessing the quality of virtual assistants, chatbots, and other large language model (LLM)-based services. However, a standard benchmark for task-oriented queries is not yet available, as existing benchmarks in the relevant NLP (Natural Language Processing) fields have primarily focused on task-oriented dialogues. Thus, we present a new methodology for efficiently generating the Task-oriented Queries Benchmark (ToQB) using existing task-oriented dialogue datasets and an LLM service. Our methodology involves formulating the underlying NLP task to summarize the original intent of a speaker in each dialogue, detailing the key steps to perform the devised NLP task using an LLM service, and outlining a framework for automating a major part of the benchmark generation process. Through a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries
Methodstravel james
