MIST: Jailbreaking Black-box Large Language Models via Iterative Semantic Tuning

Muyang Zheng; Yuanzhi Yao; Changting Lin; Caihong Kai; Yanxiang Chen; Zhiquan Liu

arXiv:2506.16792·cs.CL·September 23, 2025

MIST: Jailbreaking Black-box Large Language Models via Iterative Semantic Tuning

Muyang Zheng, Yuanzhi Yao, Changting Lin, Caihong Kai, Yanxiang Chen, Zhiquan Liu

PDF

Open Access

TL;DR

This paper introduces MIST, an iterative semantic tuning method that effectively jailbreaks black-box large language models by refining prompts to induce harmful responses with minimal queries.

Contribution

MIST is a novel approach that combines synonym search and order optimization to efficiently bypass model alignment and safety measures.

Findings

01

MIST achieves high attack success rates on multiple models.

02

It requires fewer queries compared to existing methods.

03

MIST demonstrates good transferability and computational efficiency.

Abstract

Despite efforts to align large language models (LLMs) with societal and moral values, these models remain susceptible to jailbreak attacks -- methods designed to elicit harmful responses. Jailbreaking black-box LLMs is considered challenging due to the discrete nature of token inputs, restricted access to the target LLM, and limited query budget. To address the issues above, we propose an effective method for jailbreaking black-box large language Models via Iterative Semantic Tuning, named MIST. MIST enables attackers to iteratively refine prompts that preserve the original semantic intent while inducing harmful content. Specifically, to balance semantic similarity with computational efficiency, MIST incorporates two key strategies: sequential synonym search, and its advanced version -- order-determining optimization. We conduct extensive experiments on two datasets using two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital and Cyber Forensics