Towards Automatic Evaluation of Task-Oriented Dialogue Flows
Mehrnoosh Mirtaheri, Nikhil Varghese, Chandra Khatri, Amol Kelkar

TL;DR
This paper introduces FuDGE, a new metric for evaluating task-oriented dialogue flows based on their structure and data coverage, improving standardization and automation in dialogue system design.
Contribution
The paper presents FuDGE, the first metric to evaluate dialogue flows by structural complexity and conversation coverage, facilitating better design and automation.
Findings
FuDGE effectively measures alignment of conversations with dialogue flows.
Experiments show FuDGE's robustness across manual and automated flows.
Using FuDGE improves dialogue flow quality and system efficiency.
Abstract
Task-oriented dialogue systems rely on predefined conversation schemes (dialogue flows) often represented as directed acyclic graphs. These flows can be manually designed or automatically generated from previously recorded conversations. Due to variations in domain expertise or reliance on different sets of prior conversations, these dialogue flows can manifest in significantly different graph structures. Despite their importance, there is no standard method for evaluating the quality of dialogue flows. We introduce FuDGE (Fuzzy Dialogue-Graph Edit Distance), a novel metric that evaluates dialogue flows by assessing their structural complexity and representational coverage of the conversation data. FuDGE measures how well individual conversations align with a flow and, consequently, how well a set of conversations is represented by the flow overall. Through extensive experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · Multi-Agent Systems and Negotiation
MethodsSparse Evolutionary Training · ALIGN
