ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and   Implicit Hate Speech Detection

Thomas Hartvigsen; Saadia Gabriel; Hamid Palangi; Maarten Sap,; Dipankar Ray; Ece Kamar

arXiv:2203.09509·cs.CL·July 15, 2022

ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

Thomas Hartvigsen, Saadia Gabriel, Hamid Palangi, Maarten Sap,, Dipankar Ray, Ece Kamar

PDF

1 Repo 10 Models 5 Datasets

TL;DR

ToxiGen is a large, machine-generated dataset designed to improve hate speech detection, especially for implicit and minority group-related toxicity, by providing diverse, subtly toxic examples that enhance classifier robustness.

Contribution

The paper introduces ToxiGen, a novel large-scale dataset created using a prompting framework and adversarial decoding, covering implicit toxicity and multiple minority groups, surpassing previous human-written resources.

Findings

01

Finetuning classifiers on ToxiGen improves detection of human-written hate speech.

02

Humans struggle to distinguish machine-generated from human-written toxic text.

03

ToxiGen enhances classifier performance on both real and machine-generated toxicity.

Abstract

Toxic language detection systems often falsely flag text that contains minority group mentions as toxic, as those groups are often the targets of online hate. Such over-reliance on spurious correlations also causes systems to struggle with detecting implicitly toxic language. To help mitigate these issues, we create ToxiGen, a new large-scale and machine-generated dataset of 274k toxic and benign statements about 13 minority groups. We develop a demonstration-based prompting framework and an adversarial classifier-in-the-loop decoding method to generate subtly toxic and benign text with a massive pretrained language model. Controlling machine generation in this way allows ToxiGen to cover implicitly toxic text at a larger scale, and about more demographic groups, than previous resources of human-written text. We conduct a human evaluation on a challenging subset of ToxiGen and find that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/toxigen
pytorchOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.