HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection
Binny Mathew, Punyajoy Saha, Seid Muhie Yimam, Chris Biemann, Pawan, Goyal, and Animesh Mukherjee

TL;DR
HateXplain is a comprehensive benchmark dataset for hate speech detection that includes annotations for classification, target community, and rationales, aiming to improve model interpretability and reduce bias.
Contribution
This paper introduces HateXplain, the first dataset with multi-faceted annotations for hate speech, enabling research on bias and explainability in detection models.
Findings
State-of-the-art models lack high explainability scores.
Using human rationales improves bias reduction.
Models trained with rationales perform better in interpretability.
Abstract
Hate speech is a challenging issue plaguing the online social media. While better models for hate speech detection are continuously being developed, there is little research on the bias and interpretability aspects of hate speech. In this paper, we introduce HateXplain, the first benchmark hate speech dataset covering multiple aspects of the issue. Each post in our dataset is annotated from three different perspectives: the basic, commonly used 3-class classification (i.e., hate, offensive or normal), the target community (i.e., the community that has been the victim of hate speech/offensive speech in the post), and the rationales, i.e., the portions of the post on which their labelling decision (as hate, offensive or normal) is based. We utilize existing state-of-the-art models and observe that even models that perform very well in classification do not score high on explainability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Residual Connection · Weight Decay · Attention Dropout · Linear Warmup With Linear Decay · WordPiece · Adam · Dropout · Softmax · Dense Connections
