TL;DR
DEAM introduces an AMR-based semantic manipulation approach to evaluate dialogue coherence, producing more natural negative examples and achieving higher correlation with human judgments than existing metrics.
Contribution
The paper presents DEAM, a novel dialogue coherence evaluation metric that uses AMR-based semantic manipulations for generating incoherent samples, improving correlation with human assessments.
Findings
DEAM outperforms baseline metrics in correlating with human judgments.
DEAM effectively distinguishes between coherent and incoherent dialogues.
AMR-based manipulations generate more natural negative examples.
Abstract
Automatic evaluation metrics are essential for the rapid development of open-domain dialogue systems as they facilitate hyper-parameter tuning and comparison between models. Although recently proposed trainable conversation-level metrics have shown encouraging results, the quality of the metrics is strongly dependent on the quality of training data. Prior works mainly resort to heuristic text-level manipulations (e.g. utterances shuffling) to bootstrap incoherent conversations (negative examples) from coherent dialogues (positive examples). Such approaches are insufficient to appropriately reflect the incoherence that occurs in interactions between advanced dialogue models and humans. To tackle this problem, we propose DEAM, a Dialogue coherence Evaluation metric that relies on Abstract Meaning Representation (AMR) to apply semantic-level Manipulations for incoherent (negative) data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
