xList-Hate: A Checklist-Based Framework for Interpretable and Generalizable Hate Speech Detection
Adri\'an Gir\'on, Pablo Miralles, Javier Huertas-Tato, Sergio D'Antonio, David Camacho

TL;DR
xList-Hate introduces a diagnostic, checklist-based framework using large language models and decision trees to improve interpretability and robustness in hate speech detection across diverse datasets.
Contribution
The paper presents a novel diagnostic framework that decomposes hate speech detection into explicit questions answered by LLMs, enhancing interpretability and robustness over traditional classification methods.
Findings
Improves cross-dataset robustness compared to supervised models.
Provides transparent, interpretable decision paths.
Reduces sensitivity to annotation noise and ambiguity.
Abstract
Hate speech detection is commonly framed as a direct binary classification problem despite being a composite concept defined through multiple interacting factors that vary across legal frameworks, platform policies, and annotation guidelines. As a result, supervised models often overfit dataset-specific definitions and exhibit limited robustness under domain shift and annotation noise. We introduce xList-Hate, a diagnostic framework that decomposes hate speech detection into a checklist of explicit, concept-level questions grounded in widely shared normative criteria. Each question is independently answered by a large language model (LLM), producing a binary diagnostic representation that captures hateful content features without directly predicting the final label. These diagnostic signals are then aggregated by a lightweight, fully interpretable decision tree, yielding transparent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Emotion and Mood Recognition · Bullying, Victimization, and Aggression
