Breaking the Cloak! Unveiling Chinese Cloaked Toxicity with Homophone Graph and Toxic Lexicon

Xuchen Ma; Jianxiang Yu; Wenming Shao; Bo Pang; Xiang Li

arXiv:2505.22184·cs.CL·June 6, 2025

Breaking the Cloak! Unveiling Chinese Cloaked Toxicity with Homophone Graph and Toxic Lexicon

Xuchen Ma, Jianxiang Yu, Wenming Shao, Bo Pang, Xiang Li

PDF

Open Access

TL;DR

This paper introduces C$^2$TU, a novel method for unveiling cloaked toxic language in Chinese social media, using substring matching and filtering with BERT and LLMs, achieving significant performance improvements.

Contribution

The paper presents the first Chinese-specific cloaked toxicity unveiling method that is training-free and prompt-free, utilizing substring matching and model-based filtering.

Findings

01

Outperforms existing methods by up to 71% F1 score

02

Achieves 35% higher accuracy on Chinese toxic datasets

03

Demonstrates effectiveness of BERT and LLM-based filtering

Abstract

Social media platforms have experienced a significant rise in toxic content, including abusive language and discriminatory remarks, presenting growing challenges for content moderation. Some users evade censorship by deliberately disguising toxic words through homophonic cloak, which necessitates the task of unveiling cloaked toxicity. Existing methods are mostly designed for English texts, while Chinese cloaked toxicity unveiling has not been solved yet. To tackle the issue, we propose C $^{2}$ TU, a novel training-free and prompt-free method for Chinese cloaked toxic content unveiling. It first employs substring matching to identify candidate toxic words based on Chinese homo-graph and toxic lexicon. Then it filters those candidates that are non-toxic and corrects cloaks to be their corresponding toxicities. Specifically, we develop two model variants for filtering, which are based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Authorship Attribution and Profiling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Attention Dropout · Softmax · WordPiece · Weight Decay · Multi-Head Attention · Attention Is All You Need · Linear Warmup With Linear Decay · Dropout