Hatevolution: What Static Benchmarks Don't Tell Us
Chiara Di Bonaventura, Barbara McGillivray, Yulan He, Albert Mero\~no-Pe\~nuela

TL;DR
This paper highlights the importance of time-sensitive benchmarks for evaluating hate speech detection models, revealing that static benchmarks may not accurately reflect models' robustness over evolving language.
Contribution
It empirically evaluates 20 language models on evolving hate speech data, demonstrating the limitations of static benchmarks and advocating for temporal evaluation methods.
Findings
Static benchmarks do not capture language evolution effects.
Models show decreased robustness over time in hate speech detection.
Time-sensitive benchmarks improve evaluation reliability.
Abstract
Language changes over time, including in the hate speech domain, which evolves quickly following social dynamics and cultural shifts. While NLP research has investigated the impact of language evolution on model training and has proposed several solutions for it, its impact on model benchmarking remains under-explored. Yet, hate speech benchmarks play a crucial role to ensure model safety. In this paper, we empirically evaluate the robustness of 20 language models across two evolving hate speech experiments, and we show the temporal misalignment between static and time-sensitive evaluations. Our findings call for time-sensitive linguistic benchmarks in order to correctly and reliably evaluate language models in the hate speech domain.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAmerican Constitutional Law and Politics · Academic Freedom and Politics
