PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison

ChaeHun Park; Minseok Choi; Dohyun Lee; and Jaegul Choo

arXiv:2404.01015·cs.CL·July 19, 2024·1 cites

PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison

ChaeHun Park, Minseok Choi, Dohyun Lee, and Jaegul Choo

PDF

Open Access 1 Repo

TL;DR

PairEval is a new open-domain dialogue evaluation metric that compares responses against each other to better align with human judgments and detect common dialogue system failures.

Contribution

It introduces a pairwise comparison approach for dialogue response evaluation, improving correlation with human judgments and robustness over existing metrics.

Findings

01

Higher correlation with human judgments than baseline metrics

02

More robust in detecting repetition and speaker insensitivity

03

Effective across multiple benchmark datasets

Abstract

Building a reliable and automated evaluation metric is a necessary but challenging problem for open-domain dialogue systems. Recent studies proposed evaluation metrics that assess generated responses by considering their relevance to previous dialogue histories. Although effective, these metrics evaluate individual responses directly rather than considering their relative quality compared to other responses. To handle this, we propose PairEval, a novel dialogue evaluation metric for assessing responses by comparing their quality against responses in different conversations. PairEval is built on top of open-sourced and moderate-size language models, and we make them specialized in pairwise comparison between dialogue responses. Extensive experiments on multiple benchmarks demonstrate that our metric exhibits a higher correlation with human judgments than baseline metrics. We also find…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ddehun/paireval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Topic Modeling