Detecting Machine-Translated Text using Back Translation
Hoang-Quoc Nguyen-Son, Tran Phuong Thao, Seira Hidano, Shinsaku, Kiyomoto

TL;DR
This paper introduces a novel method for detecting machine-translated text by analyzing the similarity between original and back-translated versions, outperforming existing techniques across multiple languages.
Contribution
The paper presents a new feature extraction approach based on back-translation similarity, effectively distinguishing machine and human texts without language-specific information.
Findings
Achieves 75.0% accuracy and F-score in French detection
Outperforms previous methods with up to 83.4% accuracy for back-translated text
Effective across multiple languages including Japanese
Abstract
Machine-translated text plays a crucial role in the communication of people using different languages. However, adversaries can use such text for malicious purposes such as plagiarism and fake review. The existing methods detected a machine-translated text only using the text's intrinsic content, but they are unsuitable for classifying the machine-translated and human-written texts with the same meanings. We have proposed a method to extract features used to distinguish machine/human text based on the similarity between the intrinsic text and its back-translation. The evaluation of detecting translated sentences with French shows that our method achieves 75.0% of both accuracy and F-score. It outperforms the existing methods whose the best accuracy is 62.8% and the F-score is 62.7%. The proposed method even detects more efficiently the back-translated text with 83.4% of accuracy, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Authorship Attribution and Profiling
