PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic   Degeneration in Large Language Models

Devansh Jain; Priyanshu Kumar; Samuel Gehman; Xuhui Zhou; Thomas; Hartvigsen; Maarten Sap

arXiv:2405.09373·cs.CL·August 13, 2024·1 cites

PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models

Devansh Jain, Priyanshu Kumar, Samuel Gehman, Xuhui Zhou, Thomas, Hartvigsen, Maarten Sap

PDF

Open Access 2 Repos 1 Datasets

TL;DR

This paper introduces PolygloToxicityPrompts, a large-scale multilingual benchmark with 425,000 prompts across 17 languages, to evaluate toxicity in large language models and analyze factors influencing toxicity levels.

Contribution

It presents the first comprehensive multilingual toxicity benchmark for LLMs, covering 17 languages and over 100 million web-text documents, and investigates how model size and tuning methods affect toxicity.

Findings

01

Toxicity increases with decreasing language resources and larger model sizes.

02

Instruction- and preference-tuning reduce toxicity, but preference-tuning method choice has minimal impact.

03

Multilingual evaluation reveals critical shortcomings in current LLM safety measures.

Abstract

Recent advances in large language models (LLMs) have led to their extensive global deployment, and ensuring their safety calls for comprehensive and multilingual toxicity evaluations. However, existing toxicity benchmarks are overwhelmingly focused on English, posing serious risks to deploying LLMs in other languages. We address this by introducing PolygloToxicityPrompts (PTP), the first large-scale multilingual toxicity evaluation benchmark of 425K naturally occurring prompts spanning 17 languages. We overcome the scarcity of naturally occurring toxicity in web-text and ensure coverage across languages with varying resources by automatically scraping over 100M web-text documents. Using PTP, we investigate research questions to study the impact of model size, prompt language, and instruction and preference-tuning methods on toxicity by benchmarking over 60 LLMs. Notably, we find that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

ToxicityPrompts/PolygloToxicityPrompts
dataset· 252 dl
252 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Biomedical Text Mining and Ontologies