# DDML: Multi-Student Knowledge Distillation for Hate Speech

**Authors:** Ze Liu, Zerui Shao, Haizhou Wang, Beibei Li

PMC · DOI: 10.3390/e27040417 · Entropy · 2025-04-11

## TL;DR

This paper introduces DDML, a new method for improving hate speech detection by combining knowledge from multiple models, leading to better performance across many languages.

## Contribution

The novel contribution is DDML, a multi-student knowledge distillation approach that enhances hate speech detection through mutual learning among student models.

## Key findings

- DDML improves hate speech detection performance with an average F1 score increase of 4.87% over the baseline.
- The method was tested across ten languages and nine datasets, showing consistent improvements.
- Student networks benefit from both teacher knowledge and peer-based mutual learning.

## Abstract

Recent studies have shown that hate speech on social media negatively impacts users’ mental health and is a contributing factor to suicide attempts. On a broader scale, online hate speech can undermine social stability. With the continuous growth of the internet, the prevalence of online hate speech is rising, making its detection an urgent issue. Recent advances in natural language processing, particularly with transformer-based models, have shown significant promise in hate speech detection. However, these models come with a large number of parameters, leading to high computational requirements and making them difficult to deploy on personal computers. To address these challenges, knowledge distillation offers a solution by training smaller student networks using larger teacher networks. Recognizing that learning also occurs through peer interactions, we propose a knowledge distillation method called Deep Distill–Mutual Learning (DDML). DDML employs one teacher network and two or more student networks. While the student networks benefit from the teacher’s knowledge, they also engage in mutual learning with each other. We trained numerous deep neural networks for hate speech detection based on DDML and demonstrated that these networks perform well across various datasets. We tested our method across ten languages and nine datasets. The results demonstrate that DDML enhances the performance of deep neural networks, achieving an average F1 score increase of 4.87% over the baseline.

## Full-text entities

- **Diseases:** injury to (MESH:D014947), CL (MESH:D007859), XLM-R (MESH:D018287)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** XLM-R — Homo sapiens (Human), Chronic myelogenous leukemia, BCR-ABL1 positive, Cancer cell line (CVCL_SV31)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12025758/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12025758/full.md

## References

46 references — full list in the complete paper: https://tomesphere.com/paper/PMC12025758/full.md

---
Source: https://tomesphere.com/paper/PMC12025758