Psychological Metrics for Dialog System Evaluation
Salvatore Giorgi, Shreya Havaldar, Farhan Ahmed, Zuhaib Akhtar,, Shalaka Vaidya, Gary Pan, Lyle H. Ungar, H. Andrew Schwartz, Joao Sedoc

TL;DR
This paper introduces psychologically-grounded metrics for evaluating dialog systems, capturing emotional and personality traits to complement traditional metrics, and demonstrates their effectiveness in better predicting human judgments.
Contribution
The paper presents five new interpretable psychological metrics for dialog evaluation, validated against traditional metrics and shown to improve prediction of human preferences.
Findings
Psychological metrics are uncorrelated with traditional metrics.
They provide additional meaningful information about dialog quality.
Metrics improve accuracy in predicting crowd-sourced judgments.
Abstract
We present metrics for evaluating dialog systems through a psychologically-grounded "human" lens in which conversational agents express a diversity of both states (e.g., emotion) and traits (e.g., personality), just as people do. We present five interpretable metrics from established psychology that are fundamental to human communication and relationships: emotional entropy, linguistic style and emotion matching, agreeableness, and empathy. These metrics can be applied (1) across dialogs and (2) on turns within dialogs. The psychological metrics are compared against seven state-of-the-art traditional metrics (e.g., BARTScore and BLEURT) on seven standard dialog system data sets. We also introduce a novel data set, the Three Bot Dialog Evaluation Corpus, which consists of annotated conversations from ChatGPT, GPT-3, and BlenderBot. We demonstrate that our proposed metrics offer novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Speech and dialogue systems
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Multi-Head Attention · Attention Is All You Need · Cosine Annealing · Softmax · Layer Normalization · Byte Pair Encoding · Dropout
