The similarity index of scientific publications with equations and formulas, identification of self-plagiarism, and testing of the iThenticate system
Andrei D. Polyanin, Inna K. Shingareva

TL;DR
This paper examines the challenges of calculating similarity indices in scientific texts with equations, evaluates iThenticate's effectiveness in detecting plagiarism involving formulas, and proposes improvements for software systems in this domain.
Contribution
It introduces the first analysis of similarity index estimation in publications with equations and tests iThenticate's capabilities, highlighting its limitations and suggesting enhancements.
Findings
Equations significantly complicate similarity analysis.
iThenticate often confuses self-plagiarism with pseudo-self-plagiarism.
Proposed methods aim to improve software detection accuracy.
Abstract
The problems of estimating the similarity index of mathematical and other scientific publications containing equations and formulas are discussed for the first time. It is shown that the presence of equations and formulas (as well as figures, drawings, and tables) is a complicating factor that significantly complicates the study of such texts. It is shown that the method for determining the similarity index of publications, based on taking into account individual mathematical symbols and parts of equations and formulas, is ineffective and can lead to erroneous and even completely absurd conclusions. The possibilities of the most popular software system iThenticate, currently used in scientific journals, are investigated for detecting plagiarism and self-plagiarism. The results of processing by the iThenticate system of specific examples and special test problems containing equations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Computational Techniques in Science and Engineering
