Reranking Overgenerated Responses for End-to-End Task-Oriented Dialogue Systems
Songbo Hu, Ivan Vuli\'c, Fangyu Liu, Anna Korhonen

TL;DR
This paper introduces a reranking method for overgenerated responses in end-to-end task-oriented dialogue systems, improving response quality by selecting responses closer to the gold standard without gold response access.
Contribution
The work proposes a novel reranking approach that leverages sequence-level similarity scoring to select high-quality responses from overgenerated lists, enhancing dialogue system performance.
Findings
Improved BLEU, ROUGE, and METEOR scores on MultiWOZ dataset
Enhanced response quality in human evaluations
Demonstrated robustness across multiple datasets
Abstract
End-to-end (E2E) task-oriented dialogue (ToD) systems are prone to fall into the so-called "likelihood trap", resulting in generated responses which are dull, repetitive, and often inconsistent with dialogue history. Comparing ranked lists of multiple generated responses against the "gold response" (from evaluation data) reveals a wide diversity in response quality, with many good responses placed lower in the ranked list. The main challenge, addressed in this work, is then how to reach beyond greedily generated system responses, that is, how to obtain and select such high-quality responses from the list of overgenerated responses at inference without availability of the gold response. To this end, we propose a simple yet effective reranking method which aims to select high-quality items from the lists of responses initially overgenerated by the system. The idea is to use any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · AI in Service Interactions
