# UsingWord Embedding for Cross-Language Plagiarism Detection

**Authors:** J. Ferrero, F. Agnes, L. Besacier, D. Schwab

arXiv: 1702.03082 · 2017-02-13

## TL;DR

This paper introduces new cross-language similarity detection methods using word embeddings, achieving high accuracy in English-French plagiarism detection at chunk and sentence levels.

## Contribution

It presents novel methods based on distributed word representations for cross-language similarity detection and combines them for improved performance.

## Key findings

- Achieved an F1 score of 89.15% at chunk level
- Achieved an F1 score of 88.5% at sentence level
- Demonstrated the effectiveness of combined methods on a challenging corpus

## Abstract

This paper proposes to use distributed representation of words (word embeddings) in cross-language textual similarity detection. The main contributions of this paper are the following: (a) we introduce new cross-language similarity detection methods based on distributed representation of words; (b) we combine the different methods proposed to verify their complementarity and finally obtain an overall F1 score of 89.15% for English-French similarity detection at chunk level (88.5% at sentence level) on a very challenging corpus.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1702.03082/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1702.03082/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/1702.03082/full.md

---
Source: https://tomesphere.com/paper/1702.03082