Toward Generalized Cross-Lingual Hateful Language Detection with Web-Scale Data and Ensemble LLM Annotations
Dang H. Dang, Jelena Mitrovi, Michael Granitzer

TL;DR
This paper explores how web-scale unlabelled data and ensemble LLM annotations can enhance multilingual hate speech detection, especially benefiting smaller models and low-resource languages.
Contribution
It demonstrates that combining web data with LLM-generated synthetic labels improves hate speech detection, particularly for small models and low-resource languages.
Findings
Continued pre-training on web data improves macro-F1 by ~3%.
Ensemble LLM annotations boost small model performance by +11% F1.
LightGBM ensemble outperforms other synthetic annotation strategies.
Abstract
We study whether large-scale unlabelled web data and LLM-based synthetic annotations can improve multilingual hate speech detection. Starting from texts crawled via OpenWebSearch.eu~(OWS) in four languages (English, German, Spanish, Vietnamese), we pursue two complementary strategies. First, we apply continued pre-training to BERT models by continuing masked language modelling on unlabelled OWS texts before supervised fine-tuning, and show that this yields an average macro-F1 gain of approximately 3% over standard baselines across sixteen benchmarks, with stronger gains in low-resource settings. Second, we use four open-source LLMs (Mistral-7B, Llama3.1-8B, Gemma2-9B, Qwen2.5-14B) to produce synthetic annotations through three ensemble strategies: mean averaging, majority voting, and a LightGBM meta-learner. The LightGBM ensemble consistently outperforms the other strategies.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
