ToxiFrench: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection

Axel Delaval; Shujian Yang; Haicheng Wang; Han Qiu; Jialiang Lu

arXiv:2508.11281·cs.CL·April 21, 2026

ToxiFrench: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection

Axel Delaval, Shujian Yang, Haicheng Wang, Han Qiu, Jialiang Lu

PDF

1 Models 1 Datasets

TL;DR

This paper introduces ToxiFrench, a French toxicity dataset, and demonstrates that small language models can outperform larger ones, with a novel CoT fine-tuning method improving model faithfulness and accuracy.

Contribution

The work provides a large French toxicity dataset, reveals the robustness of small models, and proposes a CoT fine-tuning strategy with DWL that enhances model performance.

Findings

01

Small Language Models outperform larger models in robustness and generalization.

02

The proposed CoT fine-tuning with DWL significantly improves model faithfulness.

03

The 4B Qwen3-4B model achieves state-of-the-art results on the ToxiFrench benchmark.

Abstract

Detecting toxic content using language models is crucial yet challenging. While substantial progress has been made in English, toxicity detection in French remains underdeveloped, primarily due to the lack of culturally relevant, human-annotated, large-scale datasets. In this work, we release ToxiFrench, a dataset of 53,622 French online comments together with a balanced benchmark split for systematic evaluation. The dataset is constructed via a semi-automated annotation pipeline that reduces manual labeling to only 10% through high-confidence LLM-based pre-annotation and human verification, while ensuring statistical alignment with human-only annotation. We then benchmark a broad range of models and uncover a counterintuitive finding: Small Language Models (SLMs) often surpass larger models in robustness and generalization on this task. Motivated by this finding, we propose a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
AxelDlv00/ToxiFrench
model

Datasets

AxelDlv00/ToxiFrench
dataset· 256 dl
256 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.