Distilling Knowledge from Large Language Models: A Concept Bottleneck Model for Hate and Counter Speech Recognition

Roberto Labadie-Tamayo; Djordje Slijep\v{c}evi\'c; Xihui Chen; Adrian Jaques B\"ock; Andreas Babic; Liz Freimann; Christiane Atzm\"uller Matthias Zeppelzauer

arXiv:2508.08274·cs.CL·August 13, 2025

Distilling Knowledge from Large Language Models: A Concept Bottleneck Model for Hate and Counter Speech Recognition

Roberto Labadie-Tamayo, Djordje Slijep\v{c}evi\'c, Xihui Chen, Adrian Jaques B\"ock, Andreas Babic, Liz Freimann, Christiane Atzm\"uller Matthias Zeppelzauer

PDF

Open Access

TL;DR

This paper introduces a transparent, adjective-based concept bottleneck model (SCBM) leveraging large language models for hate and counter speech recognition, achieving high accuracy and interpretability across multiple datasets and languages.

Contribution

The paper presents a novel adjective-based concept bottleneck approach that improves interpretability and accuracy in hate speech detection using large language models.

Findings

01

SCBM achieves an average macro-F1 score of 0.69 across five datasets.

02

SCBM outperforms recent methods on four out of five datasets.

03

Fusing adjective concepts with transformer embeddings improves performance by 1.8%.

Abstract

The rapid increase in hate speech on social media has exposed an unprecedented impact on society, making automated methods for detecting such content important. Unlike prior black-box models, we propose a novel transparent method for automated hate and counter speech recognition, i.e., "Speech Concept Bottleneck Model" (SCBM), using adjectives as human-interpretable bottleneck concepts. SCBM leverages large language models (LLMs) to map input texts to an abstract adjective-based representation, which is then sent to a light-weight classifier for downstream tasks. Across five benchmark datasets spanning multiple languages and platforms (e.g., Twitter, Reddit, YouTube), SCBM achieves an average macro-F1 score of 0.69 which outperforms the most recently reported results from the literature on four out of five datasets. Aside from high recognition accuracy, SCBM provides a high level of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Spam and Phishing Detection