Should We be Pedantic About Reasoning Errors in Machine Translation?

Calvin Bao; Marine Carpuat

arXiv:2604.09890·cs.CL·April 14, 2026

Should We be Pedantic About Reasoning Errors in Machine Translation?

Calvin Bao, Marine Carpuat

PDF

TL;DR

This paper investigates reasoning errors in machine translation across multiple languages, showing that while such errors can be identified with high precision, correcting them has limited impact on translation quality.

Contribution

It introduces an automated protocol for detecting reasoning errors in MT and evaluates the effect of various interventions on translation quality.

Findings

01

High-precision detection of reasoning errors in Urdu

02

Lower detection precision in Spanish

03

Interventions often do not significantly improve translation quality

Abstract

Across multiple language pairings (English $\to$ \{Spanish, French, German, Mandarin, Japanese, Urdu, Cantonese\}), we find reasoning errors in translation. To quantify how often these reasoning errors occur, we leverage an automated annotation protocol for reasoning evaluation wherein the goal is to detect if a reasoning step is any of three error categories: (1) source sentence-misaligned, (2) model hypothesis-misaligned, or (3) reasoning trace-misaligned. We probe the reasoning model with perturbed traces correcting for these identified reasoning errors using an array of weak-to-strong interventions: hedging, removal, re-reasoning after removal, hindsight, and oracle interventions. Experimenting with interventions on the reasoning traces suggests that small corrections to the reasoning have little impact on translation quality, but stronger interventions yield the highest resolution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.