Distilling Knowledge from Large Language Models: A Concept Bottleneck Model for Hate and Counter Speech Recognition
Roberto Labadie-Tamayo, Djordje Slijep\v{c}evi\'c, Xihui Chen, Adrian Jaques B\"ock, Andreas Babic, Liz Freimann, Christiane Atzm\"uller Matthias Zeppelzauer

TL;DR
This paper introduces a transparent, adjective-based concept bottleneck model (SCBM) leveraging large language models for hate and counter speech recognition, achieving high accuracy and interpretability across multiple datasets and languages.
Contribution
The paper presents a novel adjective-based concept bottleneck approach that improves interpretability and accuracy in hate speech detection using large language models.
Findings
SCBM achieves an average macro-F1 score of 0.69 across five datasets.
SCBM outperforms recent methods on four out of five datasets.
Fusing adjective concepts with transformer embeddings improves performance by 1.8%.
Abstract
The rapid increase in hate speech on social media has exposed an unprecedented impact on society, making automated methods for detecting such content important. Unlike prior black-box models, we propose a novel transparent method for automated hate and counter speech recognition, i.e., "Speech Concept Bottleneck Model" (SCBM), using adjectives as human-interpretable bottleneck concepts. SCBM leverages large language models (LLMs) to map input texts to an abstract adjective-based representation, which is then sent to a light-weight classifier for downstream tasks. Across five benchmark datasets spanning multiple languages and platforms (e.g., Twitter, Reddit, YouTube), SCBM achieves an average macro-F1 score of 0.69 which outperforms the most recently reported results from the literature on four out of five datasets. Aside from high recognition accuracy, SCBM provides a high level of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Spam and Phishing Detection
