# Deep Investigation of Cross-Language Plagiarism Detection Methods

**Authors:** Jeremy Ferrero, Laurent Besacier, Didier Schwab, Frederic Agnes

arXiv: 1705.08828 · 2017-05-25

## TL;DR

This paper thoroughly examines cross-language plagiarism detection techniques using a new dataset, analyzing multiple language pairs and text granularities to identify the most effective methods and understand their performance across diverse document styles.

## Contribution

It provides a comprehensive evaluation of cross-language plagiarism detection methods on a new dataset, analyzing correlations across languages and document styles to determine the best approaches.

## Key findings

- Identifies the most effective detection methods for different language pairs.
- Provides insights into how document style and language influence detection performance.
- Establishes benchmarks for cross-language plagiarism detection on a new dataset.

## Abstract

This paper is a deep investigation of cross-language plagiarism detection methods on a new recently introduced open dataset, which contains parallel and comparable collections of documents with multiple characteristics (different genres, languages and sizes of texts). We investigate cross-language plagiarism detection methods for 6 language pairs on 2 granularities of text units in order to draw robust conclusions on the best methods while deeply analyzing correlations across document styles and languages.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.08828/full.md

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/1705.08828/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/1705.08828/full.md

---
Source: https://tomesphere.com/paper/1705.08828