Translation Entropy: A Statistical Framework for Evaluating Translation Systems
Ronit D. Gross, Yanir Harel, Ido Kanter

TL;DR
This paper introduces a novel statistical framework to quantify translation entropy, enabling objective evaluation and comparison of translation systems by analyzing translation degeneracy and stability across token variations.
Contribution
It presents the first method to estimate translation entropy, providing a quantitative benchmark for assessing and ranking translation systems based on their translation degeneracy.
Findings
Translation entropy can be estimated by analyzing translation stability across token variations.
Translation degeneracy increases multiplicatively when multiple tokens are replaced.
The method effectively ranks different translation models like MarianMT, T5-Base, and NLLB-200.
Abstract
The translation of written language has been known since the 3rd century BC; however, its necessity has become increasingly common in the information age. Today, many translators exist, based on encoder-decoder deep architectures, nevertheless, no quantitative objective methods are available to assess their performance, likely because the entropy of even a single language remains unknown. This study presents a quantitative method for estimating translation entropy, with the following key finding. Given a translator, several sentences that differ by only one selected token of a given pivot sentence yield identical translations. Analyzing the statistics of this phenomenon across an ensemble of such sentences, consisting each of a pivot selected token, yields the probabilities of replacing this specific token with others while preserving the translation. These probabilities constitute the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling · Text Readability and Simplification
