# Improving Academic Plagiarism Detection for STEM Documents by Analyzing   Mathematical Content and Citations

**Authors:** Norman Meuschke, Vincent Stange, Moritz Schubotz, Michael Karmer, Bela, Gipp

arXiv: 1906.11761 · 2019-06-28

## TL;DR

This paper enhances STEM academic plagiarism detection by combining mathematical content, citation analysis, and text similarity in a two-stage process, effectively identifying concealed plagiarism in large document collections.

## Contribution

It introduces a novel two-stage detection method that integrates math, citation, and text similarity measures, improving detection of concealed plagiarism in STEM documents.

## Key findings

- Math-based similarity measures outperform previous methods.
- Combined math and citation analysis identifies suspicious cases effectively.
- The approach detects plagiarism in a collection of 102,000 STEM documents.

## Abstract

Identifying academic plagiarism is a pressing task for educational and research institutions, publishers, and funding agencies. Current plagiarism detection systems reliably find instances of copied and moderately reworded text. However, reliably detecting concealed plagiarism, such as strong paraphrases, translations, and the reuse of nontextual content and ideas is an open research problem. In this paper, we extend our prior research on analyzing mathematical content and academic citations. Both are promising approaches for improving the detection of concealed academic plagiarism primarily in Science, Technology, Engineering and Mathematics (STEM). We make the following contributions: i) We present a two-stage detection process that combines similarity assessments of mathematical content, academic citations, and text. ii) We introduce new similarity measures that consider the order of mathematical features and outperform the measures in our prior research. iii) We compare the effectiveness of the math-based, citation-based, and text-based detection approaches using confirmed cases of academic plagiarism. iv) We demonstrate that the combined analysis of math-based and citation-based content features allows identifying potentially suspicious cases in a collection of 102K STEM documents. Overall, we show that analyzing the similarity of mathematical content and academic citations is a striking supplement for conventional text-based detection approaches for academic literature in the STEM disciplines.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.11761/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1906.11761/full.md

## References

46 references — full list in the complete paper: https://tomesphere.com/paper/1906.11761/full.md

---
Source: https://tomesphere.com/paper/1906.11761