User Response and Sentiment Prediction for Automatic Dialogue Evaluation
Sarik Ghazarian, Behnam Hedayatnia, Alexandros Papangelis, Yang Liu,, Dilek Hakkani-Tur

TL;DR
This paper introduces sentiment-based methods for automatic dialogue evaluation, outperforming traditional metrics by predicting user sentiment to better align with human judgments in open-domain systems.
Contribution
It proposes novel sentiment prediction techniques for dialogue evaluation, improving correlation with human assessments over existing word-overlap metrics.
Findings
Sentiment-based evaluation correlates better with human judgments.
Proposed models outperform traditional metrics on dialogue datasets.
Effective in both written and spoken dialogue scenarios.
Abstract
Automatic evaluation is beneficial for open-domain dialog system development. However, standard word-overlap metrics (BLEU, ROUGE) do not correlate well with human judgements of open-domain dialog systems. In this work we propose to use the sentiment of the next user utterance for turn or dialog level evaluation. Specifically we propose three methods: one that predicts the next sentiment directly, and two others that predict the next user utterance using an utterance or a feedback generator model and then classify its sentiment. Experiments show our model outperforming existing automatic evaluation metrics on both written and spoken open-domain dialogue datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
