Investigating the Impact of Pre-trained Language Models on Dialog   Evaluation

Chen Zhang; Luis Fernando D'Haro; Yiming Chen; Thomas Friedrichs,; Haizhou Li

arXiv:2110.01895·cs.CL·November 3, 2021·1 cites

Investigating the Impact of Pre-trained Language Models on Dialog Evaluation

Chen Zhang, Luis Fernando D'Haro, Yiming Chen, Thomas Friedrichs,, Haizhou Li

PDF

Open Access

TL;DR

This paper systematically evaluates how various pre-trained language models influence the effectiveness of automatic metrics in open-domain dialog evaluation across multiple benchmarks.

Contribution

It provides the first comprehensive analysis of the impact of different Pr-LMs on dialog evaluation metrics, considering factors like pre-training objectives and model size.

Findings

01

Pr-LM choice significantly affects metric performance

02

Model size and pre-training objectives influence evaluation robustness

03

Cross-dataset performance varies with different Pr-LMs

Abstract

Recently, there is a surge of interest in applying pre-trained language models (Pr-LM) in automatic open-domain dialog evaluation. Pr-LMs offer a promising direction for addressing the multi-domain evaluation challenge. Yet, the impact of different Pr-LMs on the performance of automatic metrics is not well-understood. This paper examines 8 different Pr-LMs and studies their impact on three typical automatic dialog evaluation metrics across three different dialog evaluation benchmarks. Specifically, we analyze how the choice of Pr-LMs affects the performance of automatic metrics. Extensive correlation analyses on each of the metrics are performed to assess the effects of different Pr-LMs along various axes, including pre-training objectives, dialog evaluation criteria, model size, and cross-dataset robustness. This study serves as the first comprehensive assessment of the effects of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques