TEMPLATEFUZZ: Fine-Grained Chat Template Fuzzing for Jailbreaking and Red Teaming LLMs
Qingchao Shen, Zibo Xiao, Lili Huang, Enwei Hu, Yongqiang Tian, Junjie Chen

TL;DR
TEMPLATEFUZZ is a systematic fuzzing framework that identifies vulnerabilities in chat templates of LLMs, significantly improving jailbreak success rates while maintaining model accuracy.
Contribution
It introduces element-level mutation rules, heuristic search, and active learning strategies to effectively expose chat template vulnerabilities in LLMs.
Findings
Achieves an average attack success rate of 98.2% on open-source LLMs.
Outperforms state-of-the-art methods by up to 47.9% in attack success rate.
Attains 90% success rate on commercial LLMs without specified chat templates.
Abstract
Large Language Models (LLMs) are increasingly deployed across diverse domains, yet their vulnerability to jailbreak attacks, where adversarial inputs bypass safety mechanisms to elicit harmful outputs, poses significant security risks. While prior work has primarily focused on prompt injection attacks, these approaches often require resource-intensive prompt engineering and overlook other critical components, such as chat templates. This paper introduces TEMPLATEFUZZ, a fine-grained fuzzing framework that systematically exposes vulnerabilities in chat templates, a critical yet underexplored attack surface in LLMs. Specifically, TEMPLATEFUZZ (1) designs a series of element-level mutation rules to generate diverse chat template variants, (2) proposes a heuristic search strategy to guide the chat template generation toward the direction of amplifying the attack success rate (ASR) while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
