Interpretable Multi-Modal Hate Speech Detection
Prashanth Vijayaraghavan, Hugo Larochelle, Deb Roy

TL;DR
This paper introduces a deep neural multi-modal model for hate speech detection that incorporates textual and socio-cultural context, providing both improved accuracy and interpretability to address social and legal concerns.
Contribution
It presents a novel multi-modal approach that captures semantics and socio-cultural context, enhancing interpretability and outperforming existing hate speech detection methods.
Findings
Model outperforms state-of-the-art approaches
Socio-cultural features are crucial for detecting hate clusters
Interpretability aids in understanding model decisions
Abstract
With growing role of social media in shaping public opinions and beliefs across the world, there has been an increased attention to identify and counter the problem of hate speech on social media. Hate speech on online spaces has serious manifestations, including social polarization and hate crimes. While prior works have proposed automated techniques to detect hate speech online, these techniques primarily fail to look beyond the textual content. Moreover, few attempts have been made to focus on the aspects of interpretability of such models given the social and legal implications of incorrect predictions. In this work, we propose a deep neural multi-modal model that can: (a) detect hate speech by effectively capturing the semantics of the text along with socio-cultural context in which a particular hate expression is made, and (b) provide interpretable insights into decisions of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning · Internet Traffic Analysis and Secure E-voting
