Distinguishing Translations by Human, NMT, and ChatGPT: A Linguistic and Statistical Approach
Zhaokun Jiang, Qianxi Lv, Ziyin Zhang, Lei Lei

TL;DR
This study uses linguistic and statistical methods to differentiate human, NMT, and ChatGPT translations, revealing that ChatGPT translations are more similar to NMT than to human translations, with high classification accuracy achieved.
Contribution
It introduces a comprehensive analysis combining linguistic features and machine learning to distinguish translation types and explores their interrelationships, especially the similarity between ChatGPT and NMT.
Findings
Supervised classifiers accurately distinguish translation types.
ChatGPT translations are more similar to NMT than human translations.
Unsupervised clustering does not effectively separate the translation types.
Abstract
The growing popularity of neural machine translation (NMT) and LLMs represented by ChatGPT underscores the need for a deeper understanding of their distinct characteristics and relationships. Such understanding is crucial for language professionals and researchers to make informed decisions and tactful use of these cutting-edge translation technology, but remains underexplored. This study aims to fill this gap by investigating three key questions: (1) the distinguishability of ChatGPT-generated translations from NMT and human translation (HT), (2) the linguistic characteristics of each translation type, and (3) the degree of resemblance between ChatGPT-produced translations and HT or NMT. To achieve these objectives, we employ statistical testing, machine learning algorithms, and multidimensional analysis (MDA) to analyze Spokesperson's Remarks and their translations. After extracting a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
