Boosting Accuracy and Interpretability in Multilingual Hate Speech Detection Through Layer Freezing and Explainable AI

Meysam Shirdel Bilehsavar; Negin Mahmoudi; Mohammad Jalili Torkamani; Kiana Kiashemshaki

arXiv:2601.02697·cs.CL·January 7, 2026

Boosting Accuracy and Interpretability in Multilingual Hate Speech Detection Through Layer Freezing and Explainable AI

Meysam Shirdel Bilehsavar, Negin Mahmoudi, Mohammad Jalili Torkamani, Kiana Kiashemshaki

PDF

Open Access

TL;DR

This paper evaluates multilingual hate speech detection and sentiment analysis using transformer models with layer freezing, and enhances interpretability through LIME explanations, aiming to improve accuracy and transparency.

Contribution

It introduces the use of layer freezing in transformer models for multilingual tasks and integrates LIME for explainability, advancing both performance and interpretability.

Findings

01

Layer freezing improves model efficiency across languages.

02

LIME provides meaningful explanations of model decisions.

03

Transformer models achieve competitive accuracy in multilingual settings.

Abstract

Sentiment analysis focuses on identifying the emotional polarity expressed in textual data, typically categorized as positive, negative, or neutral. Hate speech detection, on the other hand, aims to recognize content that incites violence, discrimination, or hostility toward individuals or groups based on attributes such as race, gender, sexual orientation, or religion. Both tasks play a critical role in online content moderation by enabling the detection and mitigation of harmful or offensive material, thereby contributing to safer digital environments. In this study, we examine the performance of three transformer-based models: BERT-base-multilingual-cased, RoBERTa-base, and XLM-RoBERTa-base with the first eight layers frozen, for multilingual sentiment analysis and hate speech detection. The evaluation is conducted across five languages: English, Korean, Japanese, Chinese, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Spam and Phishing Detection