Minimum Bayes Risk Decoding for Error Span Detection in Reference-Free Automatic Machine Translation Evaluation

Boxuan Lyu; Haiyue Song; Hidetaka Kamigaito; Chenchen Ding; Hideki Tanaka; Masao Utiyama; Kotaro Funakoshi; Manabu Okumura

arXiv:2512.07540·cs.CL·January 1, 2026

Minimum Bayes Risk Decoding for Error Span Detection in Reference-Free Automatic Machine Translation Evaluation

Boxuan Lyu, Haiyue Song, Hidetaka Kamigaito, Chenchen Ding, Hideki Tanaka, Masao Utiyama, Kotaro Funakoshi, Manabu Okumura

PDF

Open Access

TL;DR

This paper introduces the application of Minimum Bayes Risk decoding to improve error span detection in machine translation evaluation, outperforming traditional MAP decoding and reducing computational costs through model distillation.

Contribution

It proposes a novel MBR decoding approach for ESD in MT evaluation and demonstrates its effectiveness over MAP, along with a method to reduce inference latency.

Findings

01

MBR decoding significantly improves span-level performance.

02

MBR generally outperforms MAP at system and sentence levels.

03

Distilled model reduces inference latency while maintaining accuracy.

Abstract

Error Span Detection (ESD) extends automatic machine translation (MT) evaluation by localizing translation errors and labeling their severity. Current generative ESD methods typically use Maximum a Posteriori (MAP) decoding, assuming that the model-estimated probabilities are perfectly correlated with similarity to the human annotation, but we often observe higher likelihood assigned to an incorrect annotation than to the human one. We instead apply Minimum Bayes Risk (MBR) decoding to generative ESD. We use a sentence- or span-level similarity function for MBR decoding, which selects candidate hypotheses based on their approximate similarity to the human annotation. Experimental results on the WMT24 Metrics Shared Task show that MBR decoding significantly improves span-level performance and generally matches or outperforms MAP at the system and sentence levels. To reduce the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification