ToxSyn: Reducing Bias in Hate Speech Detection via Synthetic Minority Data in Brazilian Portuguese
Iago Alves Brito, Julia Soares Dollis, Fernanda Bufon F\"arber, Diogo Fernandes Costa Silva, Arlindo Rodrigues Galv\~ao Filho

TL;DR
ToxSyn is a large-scale Portuguese corpus for multi-label hate speech detection, including minority-specific and non-toxic examples, aiming to improve model robustness and understanding of hate speech nuances.
Contribution
Introduces ToxSyn, a novel synthetic dataset with discourse annotations and minority-specific examples, addressing data scarcity and bias in hate speech detection for Portuguese.
Findings
Models trained on social media data struggle to generalize to ToxSyn.
Including non-toxic counterexamples improves detection accuracy.
Macro F1 scores can be misleading in evaluating model performance.
Abstract
The development of robust hate speech detection systems remains limited by the lack of large-scale, fine-grained training data, especially for languages beyond English. Existing corpora typically rely on coarse toxic/non-toxic labels, and the few that capture hate directed at specific minority groups critically lack the non-toxic counterexamples (i.e., benign text about minorities) required to distinguish genuine hate from mere discussion. We introduce ToxSyn, the first Portuguese large-scale corpus explicitly designed for multi-label hate speech detection across nine protected minority groups. Generated via a controllable four-stage pipeline, ToxSyn includes discourse-type annotations to capture rhetorical strategies of toxic language, such as sarcasm or dehumanization. Crucially, it systematically includes the non-toxic counterexamples absent in all other public datasets. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Spam and Phishing Detection
