Harnessing Pre-Trained Sentence Transformers for Offensive Language Detection in Indian Languages
Ananya Joshi, Raviraj Joshi

TL;DR
This paper investigates the use of pre-trained sentence transformers, especially monolingual SBERT models, for detecting offensive language in low-resource Indian languages, demonstrating promising results with room for improvement.
Contribution
It introduces the application of fine-tuned pre-trained sentence transformers for hate speech detection in Bengali, Assamese, and Gujarati, highlighting their effectiveness in low-resource language contexts.
Findings
Monolingual SBERT models outperform multilingual models.
Highest accuracy achieved in Bengali language.
Performance in Assamese and Gujarati indicates potential for further improvement.
Abstract
In our increasingly interconnected digital world, social media platforms have emerged as powerful channels for the dissemination of hate speech and offensive content. This work delves into the domain of hate speech detection, placing specific emphasis on three low-resource Indian languages: Bengali, Assamese, and Gujarati. The challenge is framed as a text classification task, aimed at discerning whether a tweet contains offensive or non-offensive content. Leveraging the HASOC 2023 datasets, we fine-tuned pre-trained BERT and SBERT models to evaluate their effectiveness in identifying hate speech. Our findings underscore the superiority of monolingual sentence-BERT models, particularly in the Bengali language, where we achieved the highest ranking. However, the performance in Assamese and Gujarati languages signifies ongoing opportunities for enhancement. Our goal is to foster inclusive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Dropout · WordPiece · Attention Dropout · Dense Connections · Linear Layer · Weight Decay · Attention Is All You Need · Adam
