CSSBench: Evaluating the Safety of Lightweight LLMs against Chinese-Specific Adversarial Patterns
Zhenhong Zhou, Shilinlu Yan, Chuanpu Liu, Qiankun Li, Kun Wang, Zhigang Zeng

TL;DR
This paper introduces CSSBench, a benchmark designed to evaluate the safety of lightweight Chinese LLMs against Chinese-specific adversarial patterns, addressing a gap in existing English-focused safety assessments.
Contribution
The paper presents CSSBench, a new Chinese-specific safety benchmark that evaluates lightweight LLMs across six real-world domains and various adversarial patterns.
Findings
Chinese-specific adversarial patterns significantly challenge lightweight LLM safety.
Lightweight models show increased over-refusal rates under Chinese adversarial queries.
CSSBench provides a comprehensive safety evaluation framework for Chinese LLM deployment.
Abstract
Large language models (LLMs) are increasingly deployed in cost-sensitive and on-device scenarios, and safety guardrails have advanced mainly in English. However, real-world Chinese malicious queries typically conceal intent via homophones, pinyin, symbol-based splitting, and other Chinese-specific patterns. These Chinese-specific adversarial patterns create the safety evaluation gap that is not well captured by existing benchmarks focused on English. This gap is particularly concerning for lightweight models, which may be more vulnerable to such specific adversarial perturbations. To bridge this gap, we introduce the Chinese-Specific Safety Benchmark (CSSBench) that emphasizes these adversarial patterns and evaluates the safety of lightweight LLMs in Chinese. Our benchmark covers six domains that are common in real Chinese scenarios, including illegal activities and compliance, privacy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Big Data and Digital Economy
