ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates
Fengqing Jiang, Zhangchen Xu, Luyao Niu, Bill Yuchen Lin, Radha, Poovendran

TL;DR
This paper uncovers a vulnerability called ChatBug in aligned LLMs caused by chat templates, showing how malicious prompts can bypass safety measures, and discusses mitigation strategies with trade-offs.
Contribution
The paper identifies ChatBug, a novel vulnerability in LLMs induced by chat templates, and develops attacks demonstrating its exploitation, highlighting challenges in balancing safety and performance.
Findings
ChatBug can be exploited to bypass safety in 8 SOTA LLMs
Existing jailbreaks are more effective when combined with ChatBug
Adversarial training mitigates ChatBug but reduces model performance
Abstract
Large language models (LLMs) are expected to follow instructions from users and engage in conversations. Techniques to enhance LLMs' instruction-following capabilities typically fine-tune them using data structured according to a predefined chat template. Although chat templates are shown to be effective in optimizing LLM performance, their impact on safety alignment of LLMs has been less understood, which is crucial for deploying LLMs safely at scale. In this paper, we investigate how chat templates affect safety alignment of LLMs. We identify a common vulnerability, named ChatBug, that is introduced by chat templates. Our key insight to identify ChatBug is that the chat templates provide a rigid format that need to be followed by LLMs, but not by users. Hence, a malicious user may not necessarily follow the chat template when prompting LLMs. Instead, malicious users could leverage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Application Security Vulnerabilities · Advanced Malware Detection Techniques · Spam and Phishing Detection
