A Novel Plagiarism Detection Approach Combining BERT-based Word Embedding, Attention-based LSTMs and an Improved Differential Evolution Algorithm
Seyed Vahid Moravvej, Seyed Jalaleddin Mousavirad, Diego Oliva, Fardin, Mohammadi

TL;DR
This paper introduces a plagiarism detection method that combines BERT embeddings, attention-based LSTMs, and an improved differential evolution algorithm, addressing class imbalance and training sensitivity issues.
Contribution
The proposed approach integrates BERT, attention LSTMs, and a novel DE algorithm with focal loss to enhance plagiarism detection accuracy and robustness.
Findings
Performs well on MSRP, SNLI, and SemEval2014 datasets.
Outperforms traditional and population-based methods.
Effectively handles unbalanced data in plagiarism detection.
Abstract
Detecting plagiarism involves finding similar items in two different sources. In this article, we propose a novel method for detecting plagiarism that is based on attention mechanism-based long short-term memory (LSTM) and bidirectional encoder representations from transformers (BERT) word embedding, enhanced with optimized differential evolution (DE) method for pre-training and a focal loss function for training. BERT could be included in a downstream task and fine-tuned as a task-specific BERT can be included in a downstream task and fine-tuned as a task-specific structure, while the trained BERT model is capable of detecting various linguistic characteristics. Unbalanced classification is one of the primary issues with plagiarism detection. We suggest a focal loss-based training technique that carefully learns minority class instances to solve this. Another issue that we tackle is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAcademic integrity and plagiarism · Topic Modeling · Hate Speech and Cyberbullying Detection
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Adam · Layer Normalization · Focal Loss · Linear Layer · Dropout · WordPiece · Weight Decay
