FENCE: A Financial and Multimodal Jailbreak Detection Dataset
Mirae Kim, Seonghun Jeong, Youngjun Kwak

TL;DR
FENCE introduces a bilingual multimodal dataset tailored for detecting jailbreak attacks on VLMs in financial contexts, enabling the development of more robust and domain-specific detection methods.
Contribution
The paper presents FENCE, a novel bilingual multimodal dataset focused on financial jailbreak detection, filling a critical resource gap for training and evaluating models in this domain.
Findings
Commercial and open-source VLMs show vulnerabilities to jailbreak attacks.
A baseline detector trained on FENCE achieves 99% in-distribution accuracy.
The dataset maintains strong performance on external benchmarks.
Abstract
Jailbreaking poses a significant risk to the deployment of Large Language Models (LLMs) and Vision Language Models (VLMs). VLMs are particularly vulnerable because they process both text and images, creating broader attack surfaces. However, available resources for jailbreak detection are scarce, particularly in finance. To address this gap, we present FENCE, a bilingual (Korean-English) multimodal dataset for training and evaluating jailbreak detectors in financial applications. FENCE emphasizes domain realism through finance-relevant queries paired with image-grounded threats. Experiments with commercial and open-source VLMs reveal consistent vulnerabilities, with GPT-4o showing measurable attack success rates and open-source models displaying greater exposure. A baseline detector trained on FENCE achieves 99 percent in-distribution accuracy and maintains strong performance on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Topic Modeling
