FrenchToxicityPrompts: a Large Benchmark for Evaluating and Mitigating   Toxicity in French Texts

Caroline Brun; Vassilina Nikoulina

arXiv:2406.17566·cs.CL·June 26, 2024

FrenchToxicityPrompts: a Large Benchmark for Evaluating and Mitigating Toxicity in French Texts

Caroline Brun, Vassilina Nikoulina

PDF

Open Access

TL;DR

This paper introduces FrenchToxicityPrompts, a large French dataset with toxicity annotations, to evaluate and improve toxicity mitigation in French language models, addressing a gap in multilingual toxicity research.

Contribution

The creation and release of a 50K French prompt dataset with toxicity annotations, enabling evaluation of 14 models for toxicity across French texts.

Findings

01

14 models show varying toxicity levels on French prompts

02

The dataset reveals challenges in toxicity mitigation for French language models

03

FrenchToxicityPrompts facilitates future research in multilingual toxicity detection

Abstract

Large language models (LLMs) are increasingly popular but are also prone to generating bias, toxic or harmful language, which can have detrimental effects on individuals and communities. Although most efforts is put to assess and mitigate toxicity in generated content, it is primarily concentrated on English, while it's essential to consider other languages as well. For addressing this issue, we create and release FrenchToxicityPrompts, a dataset of 50K naturally occurring French prompts and their continuations, annotated with toxicity scores from a widely used toxicity classifier. We evaluate 14 different models from four prevalent open-sourced families of LLMs against our dataset to assess their potential toxicity across various dimensions. We hope that our contribution will foster future research on toxicity detection and mitigation beyond Englis

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCancer Treatment and Pharmacology