Hacking Neural Evaluation Metrics with Single Hub Text
Hiroyuki Deguchi, Katsuki Chousa, Yusuke Sakai

TL;DR
This paper introduces a method to identify a single adversarial text that consistently receives high-quality evaluations across various test cases, revealing vulnerabilities in neural evaluation metrics like COMET.
Contribution
It proposes a novel approach to find a single hub text that exposes weaknesses in neural evaluation metrics, demonstrating its effectiveness across multiple language pairs.
Findings
The hub text achieved 79.1% and 67.8% COMET scores in En--Ja and En--De translation tasks.
The hub text generalizes across multiple language pairs such as Ja--En and De--En.
The method reveals vulnerabilities in neural evaluation metrics.
Abstract
Strongly human-correlated evaluation metrics serve as an essential compass for the development and improvement of generation models and must be highly reliable and robust. Recent embedding-based neural text evaluation metrics, such as COMET for translation tasks, are widely used in both research and development fields. However, there is no guarantee that they yield reliable evaluation results due to the black-box nature of neural networks. To raise concerns about the reliability and safety of such metrics, we propose a method for finding a single adversarial text in the discrete space that is consistently evaluated as high-quality, regardless of the test cases, to identify the vulnerabilities in evaluation metrics. The single hub text found with our method achieved 79.1 COMET% and 67.8 COMET% in the WMT'24 English-to-Japanese (En--Ja) and English-to-German (En--De) translation tasks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Hate Speech and Cyberbullying Detection
