Data Expansion using Back Translation and Paraphrasing for Hate Speech Detection
Djamila Romaissa Beddiar, Md Saroar Jahan, Mourad Oussalah

TL;DR
This paper introduces a data augmentation pipeline combining back translation and paraphrasing to improve hate speech detection models, evaluated on multiple datasets with promising results.
Contribution
It presents a novel deep learning data augmentation method using back translation and paraphrasing to enhance hate speech classification accuracy.
Findings
Improved classification performance on five datasets.
Back translation and paraphrasing increase data diversity.
Comparison shows effectiveness over existing methods.
Abstract
With proliferation of user generated contents in social media platforms, establishing mechanisms to automatically identify toxic and abusive content becomes a prime concern for regulators, researchers, and society. Keeping the balance between freedom of speech and respecting each other dignity is a major concern of social media platform regulators. Although, automatic detection of offensive content using deep learning approaches seems to provide encouraging results, training deep learning-based models requires large amounts of high-quality labeled data, which is often missing. In this regard, we present in this paper a new deep learning-based method that fuses a Back Translation method, and a Paraphrasing technique for data augmentation. Our pipeline investigates different word-embedding-based architectures for classification of hate speech. The back translation technique relies on an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
