Improving Implicit Hate Speech Detection via a Community-Driven Multi-Agent Framework
Ewelina Gajewska, Katarzyna Budzynska, Jaros{\l}aw A Chudziak

TL;DR
This paper introduces a community-driven multi-agent framework for detecting implicit hate speech, leveraging socio-cultural context to improve accuracy and fairness over existing prompting methods on a challenging dataset.
Contribution
It presents a novel multi-agent system that incorporates demographic-specific community agents and socio-cultural knowledge for improved hate speech detection.
Findings
Outperforms state-of-the-art prompting methods in accuracy.
Enhances fairness across demographic groups.
Achieves higher balanced accuracy with community context.
Abstract
This work proposes a contextualised detection framework for implicitly hateful speech, implemented as a multi-agent system comprising a central Moderator Agent and dynamically constructed Community Agents representing specific demographic groups. Our approach explicitly integrates socio-cultural context from publicly available knowledge sources, enabling identity-aware moderation that surpasses state-of-the-art prompting methods (zero-shot prompting, few-shot prompting, chain-of-thought prompting) and alternative approaches on a challenging ToxiGen dataset. We enhance the technical rigour of performance evaluation by incorporating balanced accuracy as a central metric of classification fairness that accounts for the trade-off between true positive and true negative rates. We demonstrate that our community-driven consultative framework significantly improves both classification accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Emotion and Mood Recognition · Adversarial Robustness in Machine Learning
