Speaker Sensitive Response Evaluation Model
JinYeong Bak, Alice Oh

TL;DR
This paper introduces a speaker-sensitive automatic evaluation model for dialogue responses that considers conversational context and speaker roles, outperforming existing metrics and generalizing across domains.
Contribution
It proposes a novel evaluation approach that incorporates speaker information and context similarity, trained on unlabeled data, improving correlation with human judgments.
Findings
Outperforms existing evaluation metrics in correlation with human scores.
Effective across different domains like Twitter and movie dialogues.
Utilizes unlabeled conversation data for training.
Abstract
Automatic evaluation of open-domain dialogue response generation is very challenging because there are many appropriate responses for a given context. Existing evaluation models merely compare the generated response with the ground truth response and rate many of the appropriate responses as inappropriate if they deviate from the ground truth. One approach to resolve this problem is to consider the similarity of the generated response with the conversational context. In this paper, we propose an automatic evaluation model based on that idea and learn the model parameters from an unlabeled conversation corpus. Our approach considers the speakers in defining the different levels of similar context. We use a Twitter conversation corpus that contains many speakers and conversations to test our evaluation model. Experiments show that our model outperforms the other existing evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
