Towards Automatic Evaluation of Task-Oriented Dialogue Flows

Mehrnoosh Mirtaheri; Nikhil Varghese; Chandra Khatri; Amol Kelkar

arXiv:2411.10416·cs.CL·November 18, 2024

Towards Automatic Evaluation of Task-Oriented Dialogue Flows

Mehrnoosh Mirtaheri, Nikhil Varghese, Chandra Khatri, Amol Kelkar

PDF

Open Access

TL;DR

This paper introduces FuDGE, a new metric for evaluating task-oriented dialogue flows based on their structure and data coverage, improving standardization and automation in dialogue system design.

Contribution

The paper presents FuDGE, the first metric to evaluate dialogue flows by structural complexity and conversation coverage, facilitating better design and automation.

Findings

01

FuDGE effectively measures alignment of conversations with dialogue flows.

02

Experiments show FuDGE's robustness across manual and automated flows.

03

Using FuDGE improves dialogue flow quality and system efficiency.

Abstract

Task-oriented dialogue systems rely on predefined conversation schemes (dialogue flows) often represented as directed acyclic graphs. These flows can be manually designed or automatically generated from previously recorded conversations. Due to variations in domain expertise or reliance on different sets of prior conversations, these dialogue flows can manifest in significantly different graph structures. Despite their importance, there is no standard method for evaluating the quality of dialogue flows. We introduce FuDGE (Fuzzy Dialogue-Graph Edit Distance), a novel metric that evaluates dialogue flows by assessing their structural complexity and representational coverage of the conversation data. FuDGE measures how well individual conversations align with a flow and, consequently, how well a set of conversations is represented by the flow overall. Through extensive experiments on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling · Multi-Agent Systems and Negotiation

MethodsSparse Evolutionary Training · ALIGN