TL;DR
AMIA is a lightweight, inference-only defense for LVLMs that automatically masks irrelevant image patches and analyzes intent to improve robustness against jailbreak attacks with minimal utility loss.
Contribution
It introduces a novel inference-only method combining masking and intention analysis to defend LVLMs without retraining.
Findings
Defense success rate increased from 52.4% to 81.7%.
Utility drop was only 2% on average.
Modest inference overhead was observed.
Abstract
We introduce AMIA, a lightweight, inference-only defense for Large Vision-Language Models (LVLMs) that (1) Automatically Masks a small set of text-irrelevant image patches to disrupt adversarial perturbations, and (2) conducts joint Intention Analysis to uncover and mitigate hidden harmful intents before response generation. Without any retraining, AMIA improves defense success rates across diverse LVLMs and jailbreak benchmarks from an average of 52.4% to 81.7%, preserves general utility with only a 2% average accuracy drop, and incurs only modest inference overhead. Ablation confirms both masking and intention analysis are essential for a robust safety-utility trade-off.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
MethodsSparse Evolutionary Training
