Boosting Accuracy and Interpretability in Multilingual Hate Speech Detection Through Layer Freezing and Explainable AI
Meysam Shirdel Bilehsavar, Negin Mahmoudi, Mohammad Jalili Torkamani, Kiana Kiashemshaki

TL;DR
This paper evaluates multilingual hate speech detection and sentiment analysis using transformer models with layer freezing, and enhances interpretability through LIME explanations, aiming to improve accuracy and transparency.
Contribution
It introduces the use of layer freezing in transformer models for multilingual tasks and integrates LIME for explainability, advancing both performance and interpretability.
Findings
Layer freezing improves model efficiency across languages.
LIME provides meaningful explanations of model decisions.
Transformer models achieve competitive accuracy in multilingual settings.
Abstract
Sentiment analysis focuses on identifying the emotional polarity expressed in textual data, typically categorized as positive, negative, or neutral. Hate speech detection, on the other hand, aims to recognize content that incites violence, discrimination, or hostility toward individuals or groups based on attributes such as race, gender, sexual orientation, or religion. Both tasks play a critical role in online content moderation by enabling the detection and mitigation of harmful or offensive material, thereby contributing to safer digital environments. In this study, we examine the performance of three transformer-based models: BERT-base-multilingual-cased, RoBERTa-base, and XLM-RoBERTa-base with the first eight layers frozen, for multilingual sentiment analysis and hate speech detection. The evaluation is conducted across five languages: English, Korean, Japanese, Chinese, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Spam and Phishing Detection
