LASH: Adaptive Semantic Hybridization for Black-Box Jailbreaking of Large Language Models
Abdullah Al Nomaan Nafi, Fnu Suya, Swarup Bhunia, Prabuddha Chakraborty

TL;DR
LASH is an adaptive framework that combines multiple attack strategies to improve black-box jailbreaking of large language models, achieving higher success rates with fewer queries.
Contribution
It introduces a novel adaptive composition method that leverages multiple attack seeds and a genetic optimizer to enhance jailbreak effectiveness.
Findings
LASH achieves an average attack success rate of 84.5% on JailbreakBench.
LASH outperforms five state-of-the-art baselines with only 30 target queries.
LASH remains effective under various defense mechanisms.
Abstract
Jailbreak attacks expose a persistent gap between the intended safety behavior of aligned large language models and their behavior under adversarial prompting. Existing automated methods are increasingly effective but each commits to a single attack family (e.g., one refinement loop, one tree search, one mutation space, or one strategy library) and no single family dominates: the best-performing method shifts across target models and harm categories, suggesting complementary strengths that per-prompt composition could exploit. We introduce LASH (LLM Adaptive Semantic Hybridization), a black-box framework that treats outputs from multiple base attacks as reusable seed prompts and adaptively composes them for each target request. Given a seed pool, LASH searches over seed subsets and softmax-normalized mixture weights; a composition module synthesizes a single candidate prompt, and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
