# Specializing General-purpose LLM Embeddings for Implicit Hate Speech Detection across Datasets

**Authors:** Vassiliy Cheremetiev, Quang Long Ho Ngo, Chau Ying Kot, Alina Elena Baia, Andrea Cavallaro

arXiv: 2508.20750 · 2025-08-29

## TL;DR

This paper demonstrates that fine-tuning general-purpose LLM embeddings can significantly improve implicit hate speech detection across various datasets, achieving state-of-the-art results without external knowledge integration.

## Contribution

The study shows that simple fine-tuning of large language model embeddings outperforms existing methods in implicit hate speech detection across multiple datasets.

## Key findings

- Up to 1.10% improvement in in-dataset F1-macro score.
- Up to 20.35% improvement in cross-dataset evaluation.
- State-of-the-art performance achieved through fine-tuning LLM embeddings.

## Abstract

Implicit hate speech (IHS) is indirect language that conveys prejudice or hatred through subtle cues, sarcasm or coded terminology. IHS is challenging to detect as it does not include explicit derogatory or inflammatory words. To address this challenge, task-specific pipelines can be complemented with external knowledge or additional information such as context, emotions and sentiment data. In this paper, we show that, by solely fine-tuning recent general-purpose embedding models based on large language models (LLMs), such as Stella, Jasper, NV-Embed and E5, we achieve state-of-the-art performance. Experiments on multiple IHS datasets show up to 1.10 percentage points improvements for in-dataset, and up to 20.35 percentage points improvements in cross-dataset evaluation, in terms of F1-macro score.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20750/full.md

## Figures

114 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20750/full.md

## References

68 references — full list in the complete paper: https://tomesphere.com/paper/2508.20750/full.md

---
Source: https://tomesphere.com/paper/2508.20750