ToxiLab: How Well Do Open-Source LLMs Generate Synthetic Toxicity Data?

Zheng Hui; Zhaoxiao Guo; Hang Zhao; Juanyong Duan; Lin Ai; Yinheng Li,; Julia Hirschberg; Congrui Huang

arXiv:2411.15175·cs.CL·February 25, 2025

ToxiLab: How Well Do Open-Source LLMs Generate Synthetic Toxicity Data?

Zheng Hui, Zhaoxiao Guo, Hang Zhao, Juanyong Duan, Lin Ai, Yinheng Li,, Julia Hirschberg, Congrui Huang

PDF

Open Access

TL;DR

This paper evaluates the effectiveness of open-source large language models in generating synthetic toxic data for hate speech detection, highlighting the benefits of supervised fine-tuning for improved data quality and diversity.

Contribution

It systematically assesses multiple open-source LLMs for toxic data synthesis, demonstrating that fine-tuning enhances data reliability and diversity over prompt-based methods.

Findings

01

Mistral outperforms other open models in toxicity data generation

02

Supervised fine-tuning improves data quality and diversity

03

Fine-tuned models offer scalable solutions for content moderation

Abstract

Effective toxic content detection relies heavily on high-quality and diverse data, which serve as the foundation for robust content moderation models. Synthetic data has become a common approach for training models across various NLP tasks. However, its effectiveness remains uncertain for highly subjective tasks like hate speech detection, with previous research yielding mixed results. This study explores the potential of open-source LLMs for harmful data synthesis, utilizing controlled prompting and supervised fine-tuning techniques to enhance data quality and diversity. We systematically evaluated 6 open source LLMs on 5 datasets, assessing their ability to generate diverse, high-quality harmful data while minimizing hallucination and duplication. Our results show that Mistral consistently outperforms other open models, and supervised fine-tuning significantly enhances data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Web Data Mining and Analysis

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Dropout · Discriminative Fine-Tuning · Linear Layer · Cosine Annealing · Attention Dropout · Layer Normalization · Byte Pair Encoding · Adam