Detecting Machine-Translated Text using Back Translation

Hoang-Quoc Nguyen-Son; Tran Phuong Thao; Seira Hidano; Shinsaku; Kiyomoto

arXiv:1910.06558·cs.CL·October 16, 2019·1 cites

Detecting Machine-Translated Text using Back Translation

Hoang-Quoc Nguyen-Son, Tran Phuong Thao, Seira Hidano, Shinsaku, Kiyomoto

PDF

Open Access

TL;DR

This paper introduces a novel method for detecting machine-translated text by analyzing the similarity between original and back-translated versions, outperforming existing techniques across multiple languages.

Contribution

The paper presents a new feature extraction approach based on back-translation similarity, effectively distinguishing machine and human texts without language-specific information.

Findings

01

Achieves 75.0% accuracy and F-score in French detection

02

Outperforms previous methods with up to 83.4% accuracy for back-translated text

03

Effective across multiple languages including Japanese

Abstract

Machine-translated text plays a crucial role in the communication of people using different languages. However, adversaries can use such text for malicious purposes such as plagiarism and fake review. The existing methods detected a machine-translated text only using the text's intrinsic content, but they are unsuitable for classifying the machine-translated and human-written texts with the same meanings. We have proposed a method to extract features used to distinguish machine/human text based on the similarity between the intrinsic text and its back-translation. The evaluation of detecting translated sentences with French shows that our method achieves 75.0% of both accuracy and F-score. It outperforms the existing methods whose the best accuracy is 62.8% and the F-score is 62.7%. The proposed method even detects more efficiently the back-translated text with 83.4% of accuracy, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Authorship Attribution and Profiling