Analyzing Similarity in Mathematical Content To Enhance the Detection of Academic Plagiarism
Maurice-Roman Isele

TL;DR
This paper reviews approaches for analyzing mathematical content to improve academic plagiarism detection, highlighting current methods' strengths and limitations, and proposing future research directions for more comprehensive detection.
Contribution
It provides an overview of mathematical information retrieval techniques and analyzes their potential for detecting various forms of mathematical plagiarism, including disguised cases.
Findings
Syntax-based approaches excel at detecting undisguised plagiarism.
Structure-based and hybrid approaches can identify disguised plagiarism with renamed identifiers.
Current methods are limited to formula-level detection; section-level and equivalence transformations require further research.
Abstract
Despite the effort put into the detection of academic plagiarism, it continues to be a ubiquitous problem spanning all disciplines. Various tools have been developed to assist human inspectors by automatically identifying suspicious documents. However, to our knowledge currently none of these tools use mathematical content for their analysis. This is problematic, because mathematical content potentially represents a significant amount of the scientific contribution in academic documents. Hence, ignoring mathematical content limits the detection of plagiarism considerably, especially in disciplines with frequent use of mathematics. This paper aims to help close this gap by providing an overview of existing approaches in mathematical information retrieval and an analysis of their applicability for different possible cases of mathematical plagiarism. I find that whereas syntax-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Handwritten Text Recognition Techniques · Natural Language Processing Techniques
