BMX: Boosting Natural Language Generation Metrics with Explainability
Christoph Leiter, Hoa Nguyen, Steffen Eger

TL;DR
BMX enhances natural language generation metrics by incorporating explainability, using feature importance scores to improve correlation with human judgments, especially in summarization tasks.
Contribution
This paper introduces BMX, a novel method that leverages explanations to boost the performance of NLG evaluation metrics, demonstrating significant improvements in summarization.
Findings
Improves correlation with human judgments in summarization.
Small improvements observed in machine translation.
BMX with LIME explainer achieves notable Spearman correlation gains.
Abstract
State-of-the-art natural language generation evaluation metrics are based on black-box language models. Hence, recent works consider their explainability with the goals of better understandability for humans and better metric analysis, including failure cases. In contrast, our proposed method BMX: Boosting Natural Language Generation Metrics with explainability explicitly leverages explanations to boost the metrics' performance. In particular, we perceive feature importance explanations as word-level scores, which we convert, via power means, into a segment-level score. We then combine this segment-level score with the original metric to obtain a better metric. Our tests show improvements for multiple metrics across MT and summarization datasets. While improvements in machine translation are small, they are strong for summarization. Notably, BMX with the LIME explainer and preselected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques
