Towards Efficient and Explainable Hate Speech Detection via Model   Distillation

Paloma Piot; Javier Parapar

arXiv:2412.13698·cs.CL·May 6, 2025

Towards Efficient and Explainable Hate Speech Detection via Model Distillation

Paloma Piot, Javier Parapar

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method to distill large language models into smaller, efficient models that can accurately classify and explain hate speech, making detection more accessible and interpretable.

Contribution

The paper presents a novel distillation approach using Chain-of-Thought to produce smaller models that maintain explanation quality and improve classification performance.

Findings

01

Distilled models match large models in explanation quality.

02

Distilled models outperform large models in classification accuracy.

03

Smaller models are more suitable for operational deployment.

Abstract

Automatic detection of hate and abusive language is essential to combat its online spread. Moreover, recognising and explaining hate speech serves to educate people about its negative effects. However, most current detection models operate as black boxes, lacking interpretability and explainability. In this context, Large Language Models (LLMs) have proven effective for hate speech detection and to promote interpretability. Nevertheless, they are computationally costly to run. In this work, we propose distilling big language models by using Chain-of-Thought to extract explanations that support the hate speech classification task. Having small language models for these tasks will contribute to their use in operational settings. In this paper, we demonstrate that distilled models deliver explanations of the same quality as larger models while surpassing them in classification performance.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

palomapiot/distil-metahate
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection