Beyond Scalar Scores: Reinforcement Learning for Error-Aware Quality Estimation of Machine Translation

Archchana Sindhujan; Girish A. Koushik; Shenbin Qian; Diptesh Kanojia; Constantin Or\u{a}san

arXiv:2602.08600·cs.CL·February 10, 2026

Beyond Scalar Scores: Reinforcement Learning for Error-Aware Quality Estimation of Machine Translation

Archchana Sindhujan, Girish A. Koushik, Shenbin Qian, Diptesh Kanojia, Constantin Or\u{a}san

PDF

Open Access

TL;DR

This paper introduces a reinforcement learning framework that enhances machine translation quality estimation by incorporating error-aware rewards, especially effective for low-resource language pairs like English to Malayalam.

Contribution

It presents the first segment-level QE dataset for English-Malayalam and a novel ALOPE-RL framework that improves QE performance using error-aware, policy-based learning with limited data.

Findings

01

ALOPE-RL outperforms larger LLM baselines and encoder-based models.

02

Error-aware rewards improve translation quality reasoning.

03

Effective QE achieved with small-scale datasets and compact models.

Abstract

Quality Estimation (QE) aims to assess the quality of machine translation (MT) outputs without relying on reference translations, making it essential for real-world, large-scale MT evaluation. Large Language Models (LLMs) have shown significant promise in advancing the field of quality estimation of machine translation. However, most of the QE approaches solely rely on scalar quality scores, offering no explicit information about the translation errors that should drive these judgments. Moreover, for low-resource languages where annotated QE data is limited, existing approaches struggle to achieve reliable performance. To address these challenges, we introduce the first segment-level QE dataset for English to Malayalam, a severely resource-scarce language pair in the QE domain, comprising human-annotated Direct Assessment (DA) scores and Translation Quality Remarks (TQR), which are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Explainable Artificial Intelligence (XAI)