The similarity index of mathematical and other scientific publications with equations and formulas and the problem of self-plagiarism identification
A.D. Polyanin, I.K. Shingareva

TL;DR
This paper examines the challenges of measuring similarity in scientific texts with equations, critiques existing software tools like iThenticate, and proposes improvements for better detection of self-plagiarism in complex documents.
Contribution
It introduces a new analysis of the limitations of current similarity measures and software in handling texts with equations, and suggests ways to enhance plagiarism detection methods.
Findings
Equations and formulas significantly complicate similarity analysis.
Current software often confuses self-plagiarism with pseudo-self-plagiarism.
Proposed improvements aim to better distinguish genuine self-plagiarism.
Abstract
The problems of estimating the similarity index of inhomogeneous scientific publications containing equations and formulas are discussed for the first time. It is shown that the presence of equations and formulas (as well as figures, drawings, and tables) is a complicating factor that significantly complicates the study of such texts. It has been proved that the method for determining the similarity index of publications, based on taking into account individual mathematical symbols and parts of equations and formulas, is ineffective and can lead to erroneous and even completely absurd conclusions. Possibilities of the most popular software systems Antiplagiat and iThenticate, currently used in scientific journals, are investigated for detecting plagiarism and self-plagiarism. The results of processing by the iThenticate system of specific examples and specific test problems containing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Computational Techniques in Science and Engineering
