Global Explainability of BERT-Based Evaluation Metrics by Disentangling along Linguistic Factors
Marvin Kaster, Wei Zhao, Steffen Eger

TL;DR
This paper investigates what linguistic aspects BERT-based evaluation metrics capture in text generation, revealing they are mostly sensitive to lexical overlap and have limitations in assessing semantic and syntactic qualities.
Contribution
It introduces a regression-based explainability method to analyze BERT-based metrics, revealing their sensitivities and limitations across linguistic factors.
Findings
Metrics capture all linguistic aspects to some degree
All metrics are highly sensitive to lexical overlap
Limitations are demonstrated through adversarial tests
Abstract
Evaluation metrics are a key ingredient for progress of text generation systems. In recent years, several BERT-based evaluation metrics have been proposed (including BERTScore, MoverScore, BLEURT, etc.) which correlate much better with human assessment of text generation quality than BLEU or ROUGE, invented two decades ago. However, little is known what these metrics, which are based on black-box language model representations, actually capture (it is typically assumed they model semantic similarity). In this work, we use a simple regression based global explainability technique to disentangle metric scores along linguistic factors, including semantics, syntax, morphology, and lexical overlap. We show that the different metrics capture all aspects to some degree, but that they are all substantially sensitive to lexical overlap, just like BLEU and ROUGE. This exposes limitations of these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Software Engineering Research · Natural Language Processing Techniques
MethodsTest
