An LSTM-based Plagiarism Detection via Attention Mechanism and a Population-based Approach for Pre-Training Parameters with imbalanced Classes
Seyed Vahid Moravvej, Seyed Jalaleddin Mousavirad, Mahshid Helali, Moghadam, Mehrdad Saadatmand

TL;DR
This paper introduces an LSTM-based plagiarism detection model enhanced with an attention mechanism and a population-based approach for pre-training, improving initialization and performance in class-imbalanced scenarios.
Contribution
It proposes a novel combination of LSTM, attention, and artificial bee colony algorithms for better parameter initialization in plagiarism detection models.
Findings
The method achieves competitive performance compared to traditional approaches.
Population-based initialization improves convergence and detection accuracy.
The approach effectively handles class imbalance in plagiarism detection tasks.
Abstract
Plagiarism is one of the leading problems in academic and industrial environments, which its goal is to find the similar items in a typical document or source code. This paper proposes an architecture based on a Long Short-Term Memory (LSTM) and attention mechanism called LSTM-AM-ABC boosted by a population-based approach for parameter initialization. Gradient-based optimization algorithms such as back-propagation (BP) are widely used in the literature for learning process in LSTM, attention mechanism, and feed-forward neural network, while they suffer from some problems such as getting stuck in local optima. To tackle this problem, population-based metaheuristic (PBMH) algorithms can be used. To this end, this paper employs a PBMH algorithm, artificial bee colony (ABC), to moderate the problem. Our proposed algorithm can find the initial values for model learning in all LSTM, attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAcademic integrity and plagiarism · Imbalanced Data Classification Techniques · Machine Learning and Data Classification
MethodsTanh Activation · Sigmoid Activation · Approximate Bayesian Computation · Long Short-Term Memory
