MENLI: Robust Evaluation Metrics from Natural Language Inference

Yanran Chen; Steffen Eger

arXiv:2208.07316·cs.CL·December 27, 2023·1 cites

MENLI: Robust Evaluation Metrics from Natural Language Inference

Yanran Chen, Steffen Eger

PDF

Open Access 1 Repo

TL;DR

This paper proposes NLI-based evaluation metrics for text generation that are more robust to adversarial attacks than existing BERT-based metrics, improving reliability and combining well with current metrics.

Contribution

It introduces NLI-based evaluation metrics for text generation, demonstrating enhanced robustness and improved performance when combined with existing metrics.

Findings

01

NLI-based metrics are more robust to adversarial attacks.

02

Combining NLI metrics with existing metrics improves overall evaluation quality.

03

NLI metrics outperform existing summarization metrics but are below SOTA MT metrics.

Abstract

Recently proposed BERT-based evaluation metrics for text generation perform well on standard benchmarks but are vulnerable to adversarial attacks, e.g., relating to information correctness. We argue that this stems (in part) from the fact that they are models of semantic similarity. In contrast, we develop evaluation metrics based on Natural Language Inference (NLI), which we deem a more appropriate modeling. We design a preference-based adversarial attack framework and show that our NLI based metrics are much more robust to the attacks than the recent BERT-based metrics. On standard benchmarks, our NLI based metrics outperform existing summarization metrics, but perform below SOTA MT metrics. However, when combining existing metrics with our NLI metrics, we obtain both higher adversarial robustness (15%-30%) and higher quality metrics as measured on standard benchmarks (+5% to 30%).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cyr19/menli
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques