C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue   Evaluation

Liliang Ren; Mankeerat Sidhu; Qi Zeng; Revanth Gangi Reddy; Heng Ji,; ChengXiang Zhai

arXiv:2306.15245·cs.CL·September 4, 2023

C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue Evaluation

Liliang Ren, Mankeerat Sidhu, Qi Zeng, Revanth Gangi Reddy, Heng Ji,, ChengXiang Zhai

PDF

Open Access 1 Repo

TL;DR

This paper introduces C-PMI, a novel turn-level dialogue evaluation metric that better correlates with human judgment by capturing user-system interactions through conditional mutual information.

Contribution

It proposes a model-agnostic C-PMI approach that significantly improves correlation with human evaluations over existing metrics.

Findings

01

Achieves 62.6% higher Spearman correlation on FED dataset

02

Outperforms existing evaluation metrics in capturing turn-level interactions

03

Code is publicly available for reproducibility

Abstract

Existing reference-free turn-level evaluation metrics for chatbots inadequately capture the interaction between the user and the system. Consequently, they often correlate poorly with human evaluations. To address this issue, we propose a novel model-agnostic approach that leverages Conditional Pointwise Mutual Information (C-PMI) to measure the turn-level interaction between the system and the user based on a given evaluation dimension. Experimental results on the widely used FED dialogue evaluation dataset demonstrate that our approach significantly improves the correlation with human judgment compared with existing evaluation systems. By replacing the negative log-likelihood-based scorer with our proposed C-PMI scorer, we achieve a relative 62.6% higher Spearman correlation on average for the FED evaluation metric. Our code is publicly available at https://github.com/renll/C-PMI.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

renll/c-pmi
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · AI in Service Interactions