FrenchToxicityPrompts: a Large Benchmark for Evaluating and Mitigating Toxicity in French Texts
Caroline Brun, Vassilina Nikoulina

TL;DR
This paper introduces FrenchToxicityPrompts, a large French dataset with toxicity annotations, to evaluate and improve toxicity mitigation in French language models, addressing a gap in multilingual toxicity research.
Contribution
The creation and release of a 50K French prompt dataset with toxicity annotations, enabling evaluation of 14 models for toxicity across French texts.
Findings
14 models show varying toxicity levels on French prompts
The dataset reveals challenges in toxicity mitigation for French language models
FrenchToxicityPrompts facilitates future research in multilingual toxicity detection
Abstract
Large language models (LLMs) are increasingly popular but are also prone to generating bias, toxic or harmful language, which can have detrimental effects on individuals and communities. Although most efforts is put to assess and mitigate toxicity in generated content, it is primarily concentrated on English, while it's essential to consider other languages as well. For addressing this issue, we create and release FrenchToxicityPrompts, a dataset of 50K naturally occurring French prompts and their continuations, annotated with toxicity scores from a widely used toxicity classifier. We evaluate 14 different models from four prevalent open-sourced families of LLMs against our dataset to assess their potential toxicity across various dimensions. We hope that our contribution will foster future research on toxicity detection and mitigation beyond Englis
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCancer Treatment and Pharmacology
