Explain the Flag: Contextualizing Hate Speech Beyond Censorship
Jason Liartis, Eirini Kaldeli, Lambrini Gyftokosta, Eleftherios Chelioudakis, Orfeas Menis Mastromichalakis

TL;DR
This paper introduces a hybrid system combining Large Language Models and curated vocabularies to detect and explain hate speech across multiple languages, enhancing transparency and accountability.
Contribution
The paper presents a novel hybrid approach that integrates LLMs with curated vocabularies for multilingual hate speech detection and explanation, improving transparency over existing methods.
Findings
High accuracy in hate speech detection across languages
Generated explanations effectively clarify why content is flagged
Outperforms LLM-only baseline systems in human evaluations
Abstract
Hate, derogatory, and offensive speech remains a persistent challenge in online platforms and public discourse. While automated detection systems are widely used, most focus on censorship or removal, raising concerns for transparency and freedom of expression, and limiting opportunities to explain why content is harmful. To address these issues, explanatory approaches have emerged as a promising solution, aiming to make hate speech detection more transparent, accountable, and informative. In this paper, we present a hybrid approach that combines Large Language Models (LLMs) with three newly created and curated vocabularies to detect and explain hate speech in English, French, and Greek. Our system captures both inherently derogatory expressions tied to identity characteristics and direct group-targeted content through two complementary pipelines: one that detects and disambiguates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
