Decoding Machine Translationese in English-Chinese News: LLMs vs. NMTs

Delu Kong; Lieve Macken

arXiv:2506.22050·cs.CL·June 30, 2025

Decoding Machine Translationese in English-Chinese News: LLMs vs. NMTs

Delu Kong, Lieve Macken

PDF

Open Access

TL;DR

This paper investigates the linguistic features of machine translation outputs in English-Chinese news texts, revealing distinct patterns and differences between neural machine translation systems and large language models.

Contribution

It introduces a large dataset and comprehensive feature analysis to identify and compare translationese in LLMs and NMTs for English-Chinese news translation.

Findings

01

MTese is detectable in both LLMs and NMTs.

02

Original Chinese texts are nearly perfectly distinguishable from machine outputs.

03

LLMs show greater lexical diversity than NMTs.

Abstract

This study explores Machine Translationese (MTese) -- the linguistic peculiarities of machine translation outputs -- focusing on the under-researched English-to-Chinese language pair in news texts. We construct a large dataset consisting of 4 sub-corpora and employ a comprehensive five-layer feature set. Then, a chi-square ranking algorithm is applied for feature selection in both classification and clustering tasks. Our findings confirm the presence of MTese in both Neural Machine Translation systems (NMTs) and Large Language Models (LLMs). Original Chinese texts are nearly perfectly distinguishable from both LLM and NMT outputs. Notable linguistic patterns in MT outputs are shorter sentence lengths and increased use of adversative conjunctions. Comparing LLMs and NMTs, we achieve approximately 70% classification accuracy, with LLMs exhibiting greater lexical diversity and NMTs using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Translation Studies and Practices · Topic Modeling