Dialogue Evaluation with Offline Reinforcement Learning
Nurul Lubis, Christian Geishauser, Hsien-Chin Lin, Carel van Niekerk,, Michael Heck, Shutong Feng, Milica Ga\v{s}i\'c

TL;DR
This paper introduces an offline reinforcement learning-based critic for evaluating task-oriented dialogue systems, which correlates well with human judgments and enables fair comparison across different systems using static corpora.
Contribution
It presents a novel offline RL critic for dialogue evaluation that is corpus- and model-independent, improving correlation with human judgments.
Findings
Offline RL critics correlate strongly with human judgments.
The method enables comparison across various dialogue systems.
The approach is corpus- and model-independent.
Abstract
Task-oriented dialogue systems aim to fulfill user goals through natural language interactions. They are ideally evaluated with human users, which however is unattainable to do at every iteration of the development phase. Simulated users could be an alternative, however their development is nontrivial. Therefore, researchers resort to offline metrics on existing human-human corpora, which are more practical and easily reproducible. They are unfortunately limited in reflecting real performance of dialogue systems. BLEU for instance is poorly correlated with human judgment, and existing corpus-based metrics such as success rate overlook dialogue context mismatches. There is still a need for a reliable metric for task-oriented systems with good generalization and strong correlation with human judgements. In this paper, we propose the use of offline reinforcement learning for dialogue…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · Intelligent Tutoring Systems and Adaptive Learning
