Chinese Toxic Language Mitigation via Sentiment Polarity Consistent Rewrites

Xintong Wang; Yixiao Liu; Jingheng Pan; Liang Ding; Longyue Wang; Chris Biemann

arXiv:2505.15297·cs.CL·May 22, 2025

Chinese Toxic Language Mitigation via Sentiment Polarity Consistent Rewrites

Xintong Wang, Yixiao Liu, Jingheng Pan, Liang Ding, Longyue Wang, Chris Biemann

PDF

Open Access 1 Video

TL;DR

This paper introduces ToxiRewriteCN, a Chinese detoxification dataset, and evaluates 17 language models on their ability to rewrite toxic content while preserving sentiment and intent, highlighting challenges in subtle toxicity cases.

Contribution

The paper presents the first Chinese detoxification dataset with sentiment preservation and provides a comprehensive evaluation of multiple LLMs on detoxification tasks.

Findings

01

Commercial and MoE models perform best overall.

02

All models struggle with emoji, homophone, and dialogue-based toxicity.

03

Balancing safety and emotional fidelity remains challenging.

Abstract

Detoxifying offensive language while preserving the speaker's original intent is a challenging yet critical goal for improving the quality of online interactions. Although large language models (LLMs) show promise in rewriting toxic content, they often default to overly polite rewrites, distorting the emotional tone and communicative intent. This problem is especially acute in Chinese, where toxicity often arises implicitly through emojis, homophones, or discourse context. We present ToxiRewriteCN, the first Chinese detoxification dataset explicitly designed to preserve sentiment polarity. The dataset comprises 1,556 carefully annotated triplets, each containing a toxic sentence, a sentiment-aligned non-toxic rewrite, and labeled toxic spans. It covers five real-world scenarios: standard expressions, emoji-induced and homophonic toxicity, as well as single-turn and multi-turn dialogues.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Chinese Toxic Language Mitigation via Sentiment Polarity Consistent Rewrites· underline

Taxonomy

TopicsNatural Language Processing Techniques · Hate Speech and Cyberbullying Detection

MethodsMixture of Experts