xList-Hate: A Checklist-Based Framework for Interpretable and Generalizable Hate Speech Detection

Adri\'an Gir\'on; Pablo Miralles; Javier Huertas-Tato; Sergio D'Antonio; David Camacho

arXiv:2602.05874·cs.CL·February 6, 2026

xList-Hate: A Checklist-Based Framework for Interpretable and Generalizable Hate Speech Detection

Adri\'an Gir\'on, Pablo Miralles, Javier Huertas-Tato, Sergio D'Antonio, David Camacho

PDF

Open Access

TL;DR

xList-Hate introduces a diagnostic, checklist-based framework using large language models and decision trees to improve interpretability and robustness in hate speech detection across diverse datasets.

Contribution

The paper presents a novel diagnostic framework that decomposes hate speech detection into explicit questions answered by LLMs, enhancing interpretability and robustness over traditional classification methods.

Findings

01

Improves cross-dataset robustness compared to supervised models.

02

Provides transparent, interpretable decision paths.

03

Reduces sensitivity to annotation noise and ambiguity.

Abstract

Hate speech detection is commonly framed as a direct binary classification problem despite being a composite concept defined through multiple interacting factors that vary across legal frameworks, platform policies, and annotation guidelines. As a result, supervised models often overfit dataset-specific definitions and exhibit limited robustness under domain shift and annotation noise. We introduce xList-Hate, a diagnostic framework that decomposes hate speech detection into a checklist of explicit, concept-level questions grounded in widely shared normative criteria. Each question is independently answered by a large language model (LLM), producing a binary diagnostic representation that captures hateful content features without directly predicting the final label. These diagnostic signals are then aggregated by a lightweight, fully interpretable decision tree, yielding transparent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Emotion and Mood Recognition · Bullying, Victimization, and Aggression