Semantically-informed distance and similarity measures for paraphrase   plagiarism identification

Miguel A. \'Alvarez-Carmona; Marc Franco-Salvador; Esa\'u; Villatoro-Tello; Manuel Montes-y-G\'omez; Paolo Rosso; Luis; Villase\~nor-Pineda

arXiv:1805.11611·cs.CL·May 31, 2018

Semantically-informed distance and similarity measures for paraphrase plagiarism identification

Miguel A. \'Alvarez-Carmona, Marc Franco-Salvador, Esa\'u, Villatoro-Tello, Manuel Montes-y-G\'omez, Paolo Rosso, Luis, Villase\~nor-Pineda

PDF

TL;DR

This paper introduces two semantically-informed measures for detecting paraphrase plagiarism, leveraging external resources or word representations, and demonstrates their effectiveness and simplicity compared to existing methods.

Contribution

The paper proposes novel semantically-informed similarity and edit distance measures that improve paraphrase plagiarism detection.

Findings

01

Measures effectively detect various paraphrase types

02

Results are competitive with state-of-the-art methods

03

Proposed metrics are simple yet effective

Abstract

Paraphrase plagiarism identification represents a very complex task given that plagiarized texts are intentionally modified through several rewording techniques. Accordingly, this paper introduces two new measures for evaluating the relatedness of two given texts: a semantically-informed similarity measure and a semantically-informed edit distance. Both measures are able to extract semantic information from either an external resource or a distributed representation of words, resulting in informative features for training a supervised classifier for detecting paraphrase plagiarism. Obtained results indicate that the proposed metrics are consistently good in detecting different types of paraphrase plagiarism. In addition, results are very competitive against state-of-the art methods having the advantage of representing a much more simple but equally effective solution.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.