# Detecting Machine-Translated Paragraphs by Matching Similar Words

**Authors:** Hoang-Quoc Nguyen-Son, Tran Phuong Thao, Seira Hidano and, Shinsaku Kiyomoto

arXiv: 1904.10641 · 2019-04-25

## TL;DR

This paper introduces a novel paragraph-level coherence matching method to detect machine-translated text, outperforming previous approaches across multiple languages with high accuracy.

## Contribution

The paper presents a new coherence-based approach for identifying machine-translated paragraphs, effectively capturing noncontinuous word similarities within sentences.

## Key findings

- Achieved 87.0% accuracy on English data
- Outperformed previous methods with 72.4% accuracy
- Demonstrated high accuracy in Dutch and Japanese

## Abstract

Machine-translated text plays an important role in modern life by smoothing communication from various communities using different languages. However, unnatural translation may lead to misunderstanding, a detector is thus needed to avoid the unfortunate mistakes. While a previous method measured the naturalness of continuous words using a N-gram language model, another method matched noncontinuous words across sentences but this method ignores such words in an individual sentence. We have developed a method matching similar words throughout the paragraph and estimating the paragraph-level coherence, that can identify machine-translated text. Experiment evaluates on 2000 English human-generated and 2000 English machine-translated paragraphs from German showing that the coherence-based method achieves high performance (accuracy = 87.0%; equal error rate = 13.0%). It is efficiently better than previous methods (best accuracy = 72.4%; equal error rate = 29.7%). Similar experiments on Dutch and Japanese obtain 89.2% and 97.9% accuracy, respectively. The results demonstrate the persistence of the proposed method in various languages with different resource levels.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.10641/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1904.10641/full.md

## References

12 references — full list in the complete paper: https://tomesphere.com/paper/1904.10641/full.md

---
Source: https://tomesphere.com/paper/1904.10641