Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search

Xun Huang; Simeng Qin; Xiaoshuang Jia; Ranjie Duan; Huanqian Yan; Zhitao Zeng; Fei Yang; Yang Liu; Xiaojun Jia

arXiv:2602.22983·cs.AI·March 25, 2026

Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search

Xun Huang, Simeng Qin, Xiaoshuang Jia, Ranjie Duan, Huanqian Yan, Zhitao Zeng, Fei Yang, Yang Liu, Xiaojun Jia

PDF

Open Access 3 Reviews

TL;DR

This paper introduces CC-BOS, a bio-inspired search framework that automatically generates classical Chinese prompts to effectively bypass LLM safety constraints, revealing vulnerabilities and improving attack success rates.

Contribution

It presents a novel multi-dimensional fruit fly optimization approach for generating classical Chinese jailbreak prompts, enhancing black-box attack efficiency and effectiveness.

Findings

01

CC-BOS outperforms existing jailbreak methods in success rate

02

The framework effectively explores the prompt search space

03

Classical Chinese prompts exploit LLM vulnerabilities

Abstract

As Large Language Models (LLMs) are increasingly used, their security risks have drawn increasing attention. Existing research reveals that LLMs are highly susceptible to jailbreak attacks, with effectiveness varying across language contexts. This paper investigates the role of classical Chinese in jailbreak attacks. Owing to its conciseness and obscurity, classical Chinese can partially bypass existing safety constraints, exposing notable vulnerabilities in LLMs. Based on this observation, this paper proposes a framework, CC-BOS, for the automatic generation of classical Chinese adversarial prompts based on multi-dimensional fruit fly optimization, facilitating efficient and automated jailbreak attacks in black-box settings. Prompts are encoded into eight policy dimensions-covering role, behavior, mechanism, metaphor, expression, knowledge, trigger pattern and context; and iteratively…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. The paper contributes a structured framework (CC-BOS) by formalizing the attack into a well-defined 8-dimensional strategy space. 2. The empirical evaluation is extensive, with near-perfect success across six SOTA models and three distinct benchmarks. The high query efficiency and robustness against Llama-Guard-3 underscore the potential of the attack.

Weaknesses

1. The success of the proposed method is critically dependent on an “attack LLM” (Deepseek-Chat) to generate the final prompts. It’s unclear if the success comes from the 8D-space or simply this model’s specific generative ability. 2. The paper highlights Fruit Fly Optimization but fails to justify its use over other standard black-box optimizers (e.g., genetic algorithms, random search), rendering the optimizer’s specific contribution is unclear. 3. The defense experiment (Table 4) is limited

Reviewer 02Rating 6Confidence 4

Strengths

1. The paper identifies a previously underexplored vulnerability of LLMs to jailbreaks that arises from classical Chinese, and proposes an automated method to generate jailbreak prompts. 2. The attack is evaluated against six baselines; experimental results indicate the proposed method is effective.

Weaknesses

The experimental evaluation of the proposed attack against defenses is insufficient: only a single defense method is assessed. The paper should evaluate additional defense methods (e.g., composite and dynamic defenses) to support its claims.

Reviewer 03Rating 2Confidence 5

Strengths

They propose classical Chinese into the study of adversarial prompt generation and jailbreaks for the first time, thereby establishing a new perspective and extending the scope of LLM security. They propose a black-box jailbreak framework that formalizes prompt generation within an eightdimensional strategy space and leverages the bio-inspired optimization algorithm to achieve systematic and automated jailbreak prompt generation. They construct a two-stage translation module to progressively m

Weaknesses

1. The paper investigates the role of classical Chinese in jailbreak attacks. Can you generalize your paper technique into many other different languages? 2. After reading the paper, I cannot understand how you model the jailbreak attacks with the feature of classical Chinese? 3. Why the BIO-INSPIRED OPTIMIZATION ALGORITHM is effective for classical Chinese for jailbreak attacks is very unclear to me.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Explainable Artificial Intelligence (XAI)