The Translation Tax Is Not a Scalar: A Counterfactual Audit of English-Source Cue Inheritance in Chinese Multilingual Benchmarks

Zezheng Lin; Fengming Liu; and Handi Li

arXiv:2605.07093·cs.CL·May 11, 2026

The Translation Tax Is Not a Scalar: A Counterfactual Audit of English-Source Cue Inheritance in Chinese Multilingual Benchmarks

Zezheng Lin, Fengming Liu, and Handi Li

PDF

TL;DR

This paper critically examines the assumption that translated benchmarks inflate scores uniformly due to English-source cues, revealing complex, estimator-dependent effects in Chinese-English settings.

Contribution

It provides a detailed counterfactual audit showing that the so-called Translation Tax varies by estimator and item, challenging the scalar assumption in multilingual benchmark evaluations.

Findings

01

Back-translation gaps are small and parser-fragile.

02

Cue-score calibration does not predict item-level gains.

03

High-residue items benefit from naturalization, low-residue items do not.

Abstract

The Translation Tax is often treated as a scalar: translated benchmarks are assumed to inflate scores by preserving English-source cues. We audit this claim in an English-to-Chinese setting. Three proxy estimators disagree: back-translation gaps are small and parser-fragile; cue-score calibration does not predict item-level gains; and a six-model native-control comparison shows model-family rather than uniform benchmark effects. We add a same-item LLM-naturalization stress test that holds answer, options, and content fixed while rewriting Chinese surface form. After correcting a prompt-construction bug, this contrast no longer supports a model-family interaction, but it preserves a residue dose-response: high-residue items benefit while low-residue items do not. The result is not a single Translation Tax, but a set of estimator- and item-dependent validity risks. We release per-cell…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.