Large Language Models as User-Agents for Evaluating   Task-Oriented-Dialogue Systems

Taaha Kazi; Ruiliang Lyu; Sizhe Zhou; Dilek Hakkani-Tur; Gokhan Tur

arXiv:2411.09972·cs.CL·November 18, 2024

Large Language Models as User-Agents for Evaluating Task-Oriented-Dialogue Systems

Taaha Kazi, Ruiliang Lyu, Sizhe Zhou, Dilek Hakkani-Tur, Gokhan Tur

PDF

Open Access

TL;DR

This paper explores using large language models as context-aware user-agents to evaluate task-oriented dialogue systems more effectively than traditional offline datasets, emphasizing improved diversity and task completion metrics.

Contribution

It introduces a novel framework leveraging LLM-based user-agents for dynamic evaluation of TOD systems, including methodologies for automatic assessment.

Findings

01

Enhanced diversity in user-agent interactions

02

Improved task completion metrics with better prompts

03

Proposed automatic evaluation methodologies

Abstract

Traditionally, offline datasets have been used to evaluate task-oriented dialogue (TOD) models. These datasets lack context awareness, making them suboptimal benchmarks for conversational systems. In contrast, user-agents, which are context-aware, can simulate the variability and unpredictability of human conversations, making them better alternatives as evaluators. Prior research has utilized large language models (LLMs) to develop user-agents. Our work builds upon this by using LLMs to create user-agents for the evaluation of TOD systems. This involves prompting an LLM, using in-context examples as guidance, and tracking the user-goal state. Our evaluation of diversity and task completion metrics for the user-agents shows improved performance with the use of better prompts. Additionally, we propose methodologies for the automatic evaluation of TOD models within this dynamic framework.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Multi-Agent Systems and Negotiation · Topic Modeling