X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework
Mohammad Zia Ur Rehman, Sai Kartheek Reddy Kasu, Shashivardhan Reddy Koppula, Sai Rithwik Reddy Chirra, Shwetank Shekhar Singh, Nagendra Kumar

TL;DR
X-MuTeST introduces a multilingual benchmark and a novel explainability-guided training framework for hate speech detection, improving accuracy and interpretability across English, Hindi, and Telugu social media data.
Contribution
It presents a new training framework combining LLM-based semantic reasoning with attention techniques and provides annotated rationales for under-resourced languages.
Findings
Enhanced classification performance with human rationales.
Improved explainability metrics such as Token-F1 and IOU-F1.
Better model attention through combined human and model explanations.
Abstract
Hate speech detection on social media faces challenges in both accuracy and explainability, especially for underexplored Indic languages. We propose a novel explainability-guided training framework, X-MuTeST (eXplainable Multilingual haTe Speech deTection), for hate speech detection that combines high-level semantic reasoning from large language models (LLMs) with traditional attention-enhancing techniques. We extend this research to Hindi and Telugu alongside English by providing benchmark human-annotated rationales for each word to justify the assigned class label. The X-MuTeST explainability method computes the difference between the prediction probabilities of the original text and those of unigrams, bigrams, and trigrams. Final explanations are computed as the union between LLM explanations and X-MuTeST explanations. We show that leveraging human rationales during training enhances…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Explainable Artificial Intelligence (XAI) · Sentiment Analysis and Opinion Mining
