ylmmcl at Multilingual Text Detoxification 2025: Lexicon-Guided Detoxification and Classifier-Gated Rewriting
Nicole Lai-Lopez, Lusha Wang, Su Yuan, Liza Zhang

TL;DR
This paper presents a multilingual text detoxification pipeline that combines lexicon-guided tagging, a fine-tuned sequence-to-sequence model, and classifier-based gating, achieving state-of-the-art results in the PAN-2025 detoxification task.
Contribution
It introduces a novel multilingual detoxification method leveraging explicit toxic word annotations and classifier gating, outperforming previous unsupervised and monolingual approaches.
Findings
Achieved highest STA score of 0.922 among previous attempts.
Attained an average J score of 0.612 on toxic inputs.
Outperformed baseline and backtranslation methods across multiple languages.
Abstract
In this work, we introduce our solution for the Multilingual Text Detoxification Task in the PAN-2025 competition for the ylmmcl team: a robust multilingual text detoxification pipeline that integrates lexicon-guided tagging, a fine-tuned sequence-to-sequence model (s-nlp/mt0-xl-detox-orpo) and an iterative classifier-based gatekeeping mechanism. Our approach departs from prior unsupervised or monolingual pipelines by leveraging explicit toxic word annotation via the multilingual_toxic_lexicon to guide detoxification with greater precision and cross-lingual generalization. Our final model achieves the highest STA (0.922) from our previous attempts, and an average official J score of 0.612 for toxic inputs in both the development and test sets. It also achieved xCOMET scores of 0.793 (dev) and 0.787 (test). This performance outperforms baseline and backtranslation methods across multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Text Readability and Simplification
