GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems
Lishan Huang, Zheng Ye, Jinghui Qin, Liang Lin, Xiaodan Liang

TL;DR
GRADE is a novel graph-based evaluation metric that improves the assessment of dialogue coherence by modeling topic transitions and reasoning over dialogue graphs, outperforming existing metrics.
Contribution
The paper introduces GRADE, a new metric combining utterance and topic-level graph representations for better dialogue coherence evaluation.
Findings
GRADE outperforms state-of-the-art metrics in correlation with human judgments.
Incorporates topic transition dynamics via dialogue graphs.
Provides a large-scale human evaluation benchmark.
Abstract
Automatically evaluating dialogue coherence is a challenging but high-demand ability for developing high-quality open-domain dialogue systems. However, current evaluation metrics consider only surface features or utterance-level semantics, without explicitly considering the fine-grained topic transition dynamics of dialogue flows. Here, we first consider that the graph structure constituted with topics in a dialogue can accurately depict the underlying communication logic, which is a more natural way to produce persuasive metrics. Capitalized on the topic-level dialogue graph, we propose a new evaluation metric GRADE, which stands for Graph-enhanced Representations for Automatic Dialogue Evaluation. Specifically, GRADE incorporates both coarse-grained utterance-level contextualized representations and fine-grained topic-level graph representations to evaluate dialogue coherence. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Speech and dialogue systems
