Hate Speech and Offensive Content Detection in Indo-Aryan Languages: A   Battle of LSTM and Transformers

Nikhil Narayan; Mrutyunjay Biswal; Pramod Goyal; Abhranta Panigrahi

arXiv:2312.05671·cs.CL·December 12, 2023·1 cites

Hate Speech and Offensive Content Detection in Indo-Aryan Languages: A Battle of LSTM and Transformers

Nikhil Narayan, Mrutyunjay Biswal, Pramod Goyal, Abhranta Panigrahi

PDF

Open Access

TL;DR

This paper compares the effectiveness of LSTM and transformer-based models in detecting hate speech across five Indo-Aryan languages, providing insights into model performance variations and suitability for multilingual hate speech detection.

Contribution

It offers a comprehensive analysis of multiple pre-trained models for hate speech detection in five Indo-Aryan languages, highlighting their relative strengths and weaknesses.

Findings

01

Bert Base Multilingual Cased performs well across languages.

02

XLM-R excels in Sinhala detection.

03

Custom LSTM outperforms others in Gujarati.

Abstract

Social media platforms serve as accessible outlets for individuals to express their thoughts and experiences, resulting in an influx of user-generated data spanning all age groups. While these platforms enable free expression, they also present significant challenges, including the proliferation of hate speech and offensive content. Such objectionable language disrupts objective discourse and can lead to radicalization of debates, ultimately threatening democratic values. Consequently, organizations have taken steps to monitor and curb abusive behavior, necessitating automated methods for identifying suspicious posts. This paper contributes to Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages (HASOC) 2023 shared tasks track. We, team Z-AGI Labs, conduct a comprehensive comparative analysis of hate speech classification across five distinct languages:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Internet Traffic Analysis and Secure E-voting

MethodsMulti-Head Attention · Attention Is All You Need · Fast Attention Via Positive Orthogonal Random Features · Tanh Activation · Sigmoid Activation · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Dense Connections · Long Short-Term Memory · Performer