Should We be Pedantic About Reasoning Errors in Machine Translation?
Calvin Bao, Marine Carpuat

TL;DR
This paper investigates reasoning errors in machine translation across multiple languages, showing that while such errors can be identified with high precision, correcting them has limited impact on translation quality.
Contribution
It introduces an automated protocol for detecting reasoning errors in MT and evaluates the effect of various interventions on translation quality.
Findings
High-precision detection of reasoning errors in Urdu
Lower detection precision in Spanish
Interventions often do not significantly improve translation quality
Abstract
Across multiple language pairings (English \{Spanish, French, German, Mandarin, Japanese, Urdu, Cantonese\}), we find reasoning errors in translation. To quantify how often these reasoning errors occur, we leverage an automated annotation protocol for reasoning evaluation wherein the goal is to detect if a reasoning step is any of three error categories: (1) source sentence-misaligned, (2) model hypothesis-misaligned, or (3) reasoning trace-misaligned. We probe the reasoning model with perturbed traces correcting for these identified reasoning errors using an array of weak-to-strong interventions: hedging, removal, re-reasoning after removal, hindsight, and oracle interventions. Experimenting with interventions on the reasoning traces suggests that small corrections to the reasoning have little impact on translation quality, but stronger interventions yield the highest resolution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
