Improving Dialogue Management: Quality Datasets vs Models

Miguel \'Angel Medina-Ram\'irez; Cayetano Guerra-Artal; Mario; Hern\'andez-Tejera

arXiv:2310.01339·cs.CL·October 24, 2023

Improving Dialogue Management: Quality Datasets vs Models

Miguel \'Angel Medina-Ram\'irez, Cayetano Guerra-Artal, Mario, Hern\'andez-Tejera

PDF

Open Access 1 Repo

TL;DR

This paper argues that the quality of datasets significantly impacts dialogue management performance, demonstrating that dataset errors are a major source of failure in task-oriented dialogue systems.

Contribution

The study introduces a synthetic dialogue generator to analyze how dataset errors affect dialogue management, highlighting dataset quality as a critical factor.

Findings

01

Dataset errors proportionally reduce model performance

02

Errors in popular datasets like Multiwoz 2.1 and SGD are significant

03

Synthetic data experiments confirm dataset quality impacts outcomes

Abstract

Task-oriented dialogue systems (TODS) have become crucial for users to interact with machines and computers using natural language. One of its key components is the dialogue manager, which guides the conversation towards a good goal for the user by providing the best possible response. Previous works have proposed rule-based systems (RBS), reinforcement learning (RL), and supervised learning (SL) as solutions for the correct dialogue management; in other words, select the best response given input by the user. However, this work argues that the leading cause of DMs not achieving maximum performance resides in the quality of the datasets rather than the models employed thus far; this means that dataset errors, like mislabeling, originate a large percentage of failures in dialogue management. We studied the main errors in the most widely used datasets, Multiwoz 2.1 and SGD, to demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

miguel-kjh/Improving-Dialogue-Management
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling · Intelligent Tutoring Systems and Adaptive Learning

MethodsLinear Layer · Stochastic Gradient Descent · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Dropout · Byte Pair Encoding · Adam · Dense Connections