FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts

Ziyi Zhang; Zhen Sun; Zongmin Zhang; Jihui Guo; Xinlei He

arXiv:2502.21059·cs.CV·September 23, 2025

FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts

Ziyi Zhang, Zhen Sun, Zongmin Zhang, Jihui Guo, Xinlei He

PDF

Open Access 1 Video

TL;DR

This paper introduces FC-Attack, a novel method that uses auto-generated flowcharts to jailbreak multimodal large language models, revealing vulnerabilities and proposing defenses to improve safety.

Contribution

We propose FC-Attack, a new flowchart-based attack method that effectively induces harmful outputs in multimodal LLMs, highlighting safety risks and potential mitigation strategies.

Findings

01

FC-Attack achieves up to 96% success rate via images.

02

Flowchart shape and font style significantly affect attack success.

03

AdaShield reduces jailbreak effectiveness but impacts utility.

Abstract

Multimodal Large Language Models (MLLMs) have become powerful and widely adopted in some practical applications. However, recent research has revealed their vulnerability to multimodal jailbreak attacks, whereby the model can be induced to generate harmful content, leading to safety risks. Although most MLLMs have undergone safety alignment, recent research shows that the visual modality is still vulnerable to jailbreak attacks. In our work, we discover that by using flowcharts with partially harmful information, MLLMs can be induced to provide additional harmful details. Based on this, we propose a jailbreak attack method based on auto-generated flowcharts, FC-Attack. Specifically, FC-Attack first fine-tunes a pre-trained LLM to create a step-description generator based on benign datasets. The generator is then used to produce step descriptions corresponding to a harmful query, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Topic Modeling