Virtual Context: Enhancing Jailbreak Attacks with Special Token Injection
Yuqi Zhou, Lin Lu, Hanchi Sun, Pan Zhou, Lichao Sun

TL;DR
This paper introduces Virtual Context, a method using special tokens to significantly improve jailbreak attack success rates on large language models, highlighting a new security threat and suggesting its inclusion in testing.
Contribution
It presents Virtual Context, a novel approach leveraging special tokens to enhance jailbreak attacks with minimal background knowledge, increasing success rates across multiple models.
Findings
Virtual Context improves jailbreak success rates by approximately 40%.
It enhances effectiveness of four widely used jailbreak methods.
Applying Virtual Context to malicious behaviors still achieves notable jailbreak effects.
Abstract
Jailbreak attacks on large language models (LLMs) involve inducing these models to generate harmful content that violates ethics or laws, posing a significant threat to LLM security. Current jailbreak attacks face two main challenges: low success rates due to defensive measures and high resource requirements for crafting specific prompts. This paper introduces Virtual Context, which leverages special tokens, previously overlooked in LLM security, to improve jailbreak attacks. Virtual Context addresses these challenges by significantly increasing the success rates of existing jailbreak methods and requiring minimal background knowledge about the target model, thus enhancing effectiveness in black-box settings without additional overhead. Comprehensive evaluations show that Virtual Context-assisted jailbreak attacks can improve the success rates of four widely used jailbreak methods by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Digital and Cyber Forensics · Adversarial Robustness in Machine Learning
