ToxiLab: How Well Do Open-Source LLMs Generate Synthetic Toxicity Data?
Zheng Hui, Zhaoxiao Guo, Hang Zhao, Juanyong Duan, Lin Ai, Yinheng Li,, Julia Hirschberg, Congrui Huang

TL;DR
This paper evaluates the effectiveness of open-source large language models in generating synthetic toxic data for hate speech detection, highlighting the benefits of supervised fine-tuning for improved data quality and diversity.
Contribution
It systematically assesses multiple open-source LLMs for toxic data synthesis, demonstrating that fine-tuning enhances data reliability and diversity over prompt-based methods.
Findings
Mistral outperforms other open models in toxicity data generation
Supervised fine-tuning improves data quality and diversity
Fine-tuned models offer scalable solutions for content moderation
Abstract
Effective toxic content detection relies heavily on high-quality and diverse data, which serve as the foundation for robust content moderation models. Synthetic data has become a common approach for training models across various NLP tasks. However, its effectiveness remains uncertain for highly subjective tasks like hate speech detection, with previous research yielding mixed results. This study explores the potential of open-source LLMs for harmful data synthesis, utilizing controlled prompting and supervised fine-tuning techniques to enhance data quality and diversity. We systematically evaluated 6 open source LLMs on 5 datasets, assessing their ability to generate diverse, high-quality harmful data while minimizing hallucination and duplication. Our results show that Mistral consistently outperforms other open models, and supervised fine-tuning significantly enhances data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Web Data Mining and Analysis
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Dropout · Discriminative Fine-Tuning · Linear Layer · Cosine Annealing · Attention Dropout · Layer Normalization · Byte Pair Encoding · Adam
