Hacking Neural Evaluation Metrics with Single Hub Text

Hiroyuki Deguchi; Katsuki Chousa; Yusuke Sakai

arXiv:2512.16323·cs.CL·January 14, 2026

Hacking Neural Evaluation Metrics with Single Hub Text

Hiroyuki Deguchi, Katsuki Chousa, Yusuke Sakai

PDF

Open Access

TL;DR

This paper introduces a method to identify a single adversarial text that consistently receives high-quality evaluations across various test cases, revealing vulnerabilities in neural evaluation metrics like COMET.

Contribution

It proposes a novel approach to find a single hub text that exposes weaknesses in neural evaluation metrics, demonstrating its effectiveness across multiple language pairs.

Findings

01

The hub text achieved 79.1% and 67.8% COMET scores in En--Ja and En--De translation tasks.

02

The hub text generalizes across multiple language pairs such as Ja--En and De--En.

03

The method reveals vulnerabilities in neural evaluation metrics.

Abstract

Strongly human-correlated evaluation metrics serve as an essential compass for the development and improvement of generation models and must be highly reliable and robust. Recent embedding-based neural text evaluation metrics, such as COMET for translation tasks, are widely used in both research and development fields. However, there is no guarantee that they yield reliable evaluation results due to the black-box nature of neural networks. To raise concerns about the reliability and safety of such metrics, we propose a method for finding a single adversarial text in the discrete space that is consistently evaluated as high-quality, regardless of the test cases, to identify the vulnerabilities in evaluation metrics. The single hub text found with our method achieved 79.1 COMET% and 67.8 COMET% in the WMT'24 English-to-Japanese (En--Ja) and English-to-German (En--De) translation tasks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Hate Speech and Cyberbullying Detection