Automatic Classification of Human Translation and Machine Translation: A Study from the Perspective of Lexical Diversity
Yingxue Fu, Mark-Jan Nederhof

TL;DR
This study demonstrates that machine translation and human translation can be distinguished systematically using lexical diversity measures, revealing important implications for translation quality evaluation.
Contribution
The paper introduces a classification approach combining trigram and BERT models to differentiate human and machine translations based on lexical diversity, highlighting its impact on evaluation metrics.
Findings
Machine translation is more accurately classified than human translation.
Lexical diversity differences explain classification performance.
Automatic metrics correlate with translation classification results.
Abstract
By using a trigram model and fine-tuning a pretrained BERT model for sequence classification, we show that machine translation and human translation can be classified with an accuracy above chance level, which suggests that machine translation and human translation are different in a systematic way. The classification accuracy of machine translation is much higher than of human translation. We show that this may be explained by the difference in lexical diversity between machine translation and human translation. If machine translation has independent patterns from human translation, automatic metrics which measure the deviation of machine translation from human translation may conflate difference with quality. Our experiment with two different types of automatic metrics shows correlation with the result of the classification task. Therefore, we suggest the difference in lexical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Translation Studies and Practices
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Softmax · Linear Warmup With Linear Decay · Residual Connection · WordPiece · Dense Connections
