Hamtajoo: A Persian Plagiarism Checker for Academic Manuscripts
Vahid Zarrabi, Salar Mohtaj, Habibollah Asghari

TL;DR
Hamtajoo is a novel Persian plagiarism detection system designed for academic manuscripts, addressing challenges in semantic text re-use detection and low-resource language NLP applications.
Contribution
The paper introduces Hamtajoo, a comprehensive Persian plagiarism detection system with detailed algorithms, tailored for low-resource language contexts and evaluated on PAN standards.
Findings
Effective detection of semantically altered plagiarism patterns.
High performance on PAN plagiarism detection corpus.
Addresses challenges in low-resource language NLP.
Abstract
In recent years, due to the high availability of electronic documents through the Web, the plagiarism has become a serious challenge, especially among scholars. Various plagiarism detection systems have been developed to prevent text re-use and to confront plagiarism. Although it is almost easy to detect duplicate text in academic manuscripts, finding patterns of text re-use that has been semantically changed is of great importance. Another important issue is to deal with less resourced languages, which there are low volume of text for training purposes and also low performance in tools for NLP applications. In this paper, we introduce Hamtajoo, a Persian plagiarism detection system for academic manuscripts. Moreover, we describe the overall structure of the system along with the algorithms used in each stage. In order to evaluate the performance of the proposed system, we used a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Academic integrity and plagiarism
