VALD: Multi-Stage Vision Attack Detection for Efficient LVLM Defense
Nadav Kadvil, Malak Fares, Ayellet Tal

TL;DR
This paper presents VALD, a multi-stage, training-free defense mechanism for LVLMs that efficiently detects and mitigates adversarial images by combining quick filtering, embedding analysis, and response consolidation.
Contribution
It introduces a novel multi-stage detection framework that efficiently identifies adversarial inputs without training, enhancing LVLM robustness with minimal computational overhead.
Findings
Achieves state-of-the-art accuracy in adversarial image detection.
Most clean inputs bypass costly processing stages.
Maintains minimal overhead even with many adversarial examples.
Abstract
Large Vision-Language Models (LVLMs) can be vulnerable to adversarial images that subtly bias their outputs toward plausible yet incorrect responses. We introduce a general, efficient, and training-free defense that combines image transformations with agentic data consolidation to recover correct model behavior. A key component of our approach is a two-stage detection mechanism that quickly filters out the majority of clean inputs. We first assess image consistency under content-preserving transformations at negligible computational cost. For more challenging cases, we examine discrepancies in a text-embedding space. Only when necessary do we invoke a powerful LLM to resolve attack-induced divergences. A key idea is to consolidate multiple responses, leveraging both their similarities and their differences. We show that our method achieves state-of-the-art accuracy while maintaining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Hate Speech and Cyberbullying Detection
