A Novel Plagiarism Detection Approach Combining BERT-based Word   Embedding, Attention-based LSTMs and an Improved Differential Evolution   Algorithm

Seyed Vahid Moravvej; Seyed Jalaleddin Mousavirad; Diego Oliva; Fardin; Mohammadi

arXiv:2305.02374·cs.CL·May 5, 2023·21 cites

A Novel Plagiarism Detection Approach Combining BERT-based Word Embedding, Attention-based LSTMs and an Improved Differential Evolution Algorithm

Seyed Vahid Moravvej, Seyed Jalaleddin Mousavirad, Diego Oliva, Fardin, Mohammadi

PDF

Open Access

TL;DR

This paper introduces a plagiarism detection method that combines BERT embeddings, attention-based LSTMs, and an improved differential evolution algorithm, addressing class imbalance and training sensitivity issues.

Contribution

The proposed approach integrates BERT, attention LSTMs, and a novel DE algorithm with focal loss to enhance plagiarism detection accuracy and robustness.

Findings

01

Performs well on MSRP, SNLI, and SemEval2014 datasets.

02

Outperforms traditional and population-based methods.

03

Effectively handles unbalanced data in plagiarism detection.

Abstract

Detecting plagiarism involves finding similar items in two different sources. In this article, we propose a novel method for detecting plagiarism that is based on attention mechanism-based long short-term memory (LSTM) and bidirectional encoder representations from transformers (BERT) word embedding, enhanced with optimized differential evolution (DE) method for pre-training and a focal loss function for training. BERT could be included in a downstream task and fine-tuned as a task-specific BERT can be included in a downstream task and fine-tuned as a task-specific structure, while the trained BERT model is capable of detecting various linguistic characteristics. Unbalanced classification is one of the primary issues with plagiarism detection. We suggest a focal loss-based training technique that carefully learns minority class instances to solve this. Another issue that we tackle is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAcademic integrity and plagiarism · Topic Modeling · Hate Speech and Cyberbullying Detection

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Adam · Layer Normalization · Focal Loss · Linear Layer · Dropout · WordPiece · Weight Decay