Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency
Yukun Jiang, Mingjie Li, Michael Backes, Yang Zhang

TL;DR
This paper introduces a novel concurrency-based jailbreak method for large language models, revealing that concurrent task execution can bypass guardrails and pose new security risks, with experiments showing its effectiveness and stealthiness.
Contribution
The paper proposes a word-level method to enable task concurrency in LLMs and introduces JAIL-CON, an iterative framework that exploits concurrency to improve jailbreak success and stealth.
Findings
Concurrent tasks reduce filter effectiveness against harmful content.
JAIL-CON outperforms existing jailbreak methods in success rate.
Concurrent answers are less detectable by guardrails.
Abstract
Despite their superior performance on a wide range of domains, large language models (LLMs) remain vulnerable to misuse for generating harmful content, a risk that has been further amplified by various jailbreak attacks. Existing jailbreak attacks mainly follow sequential logic, where LLMs understand and answer each given task one by one. However, concurrency, a natural extension of the sequential scenario, has been largely overlooked. In this work, we first propose a word-level method to enable task concurrency in LLMs, where adjacent words encode divergent intents. Although LLMs maintain strong utility in answering concurrent tasks, which is demonstrated by our evaluations on mathematical and general question-answering benchmarks, we notably observe that combining a harmful task with a benign one significantly reduces the probability of it being filtered by the guardrail, showing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
