ToxiFrench: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection
Axel Delaval, Shujian Yang, Haicheng Wang, Han Qiu, Jialiang Lu

TL;DR
This paper introduces ToxiFrench, a French toxicity dataset, and demonstrates that small language models can outperform larger ones, with a novel CoT fine-tuning method improving model faithfulness and accuracy.
Contribution
The work provides a large French toxicity dataset, reveals the robustness of small models, and proposes a CoT fine-tuning strategy with DWL that enhances model performance.
Findings
Small Language Models outperform larger models in robustness and generalization.
The proposed CoT fine-tuning with DWL significantly improves model faithfulness.
The 4B Qwen3-4B model achieves state-of-the-art results on the ToxiFrench benchmark.
Abstract
Detecting toxic content using language models is crucial yet challenging. While substantial progress has been made in English, toxicity detection in French remains underdeveloped, primarily due to the lack of culturally relevant, human-annotated, large-scale datasets. In this work, we release ToxiFrench, a dataset of 53,622 French online comments together with a balanced benchmark split for systematic evaluation. The dataset is constructed via a semi-automated annotation pipeline that reduces manual labeling to only 10% through high-confidence LLM-based pre-annotation and human verification, while ensuring statistical alignment with human-only annotation. We then benchmark a broad range of models and uncover a counterintuitive finding: Small Language Models (SLMs) often surpass larger models in robustness and generalization on this task. Motivated by this finding, we propose a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
