Penalizing Length: Uncovering Systematic Bias in Quality Estimation Metrics

Yilin Zhang; Wenda Xu; Zhongtao Liu; Tetsuji Nakagawa; Markus Freitag

arXiv:2510.22028·cs.CL·April 3, 2026

Penalizing Length: Uncovering Systematic Bias in Quality Estimation Metrics

Yilin Zhang, Wenda Xu, Zhongtao Liu, Tetsuji Nakagawa, Markus Freitag

PDF

TL;DR

This paper uncovers systematic length biases in quality estimation metrics for machine translation, demonstrating how they over-predict errors in longer texts and prefer shorter translations, and proposes length normalization as a solution.

Contribution

The study identifies length bias issues in QE metrics, analyzes their root causes, and introduces length normalization during training to improve reliability across translation lengths.

Findings

01

QE metrics over-predict errors as length increases

02

QE metrics favor shorter translations when quality is similar

03

Length normalization during training reduces length bias

Abstract

Quality Estimation (QE) metrics are vital in machine translation for reference-free evaluation and increasingly serve as selection criteria in data filtering and candidate reranking. However, the prevalence and impact of length bias in QE metrics have been underexplored. Through a systematic study of top-performing learned and LLM-as-a-Judge QE metrics across 10 diverse language pairs, we reveal two critical length biases: First, QE metrics consistently over-predict errors with increasing translation length, even for high-quality, error-free texts. Second, they exhibit a systematic preference for shorter translations when multiple candidates of comparable quality are available for the same source text. These biases risk unfairly penalizing longer, correct translations and can propagate into downstream pipelines that rely on QE signals for data selection or system optimization. We trace…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.