VALD: Multi-Stage Vision Attack Detection for Efficient LVLM Defense

Nadav Kadvil; Malak Fares; Ayellet Tal

arXiv:2602.19570·cs.CV·March 18, 2026

VALD: Multi-Stage Vision Attack Detection for Efficient LVLM Defense

Nadav Kadvil, Malak Fares, Ayellet Tal

PDF

Open Access

TL;DR

This paper presents VALD, a multi-stage, training-free defense mechanism for LVLMs that efficiently detects and mitigates adversarial images by combining quick filtering, embedding analysis, and response consolidation.

Contribution

It introduces a novel multi-stage detection framework that efficiently identifies adversarial inputs without training, enhancing LVLM robustness with minimal computational overhead.

Findings

01

Achieves state-of-the-art accuracy in adversarial image detection.

02

Most clean inputs bypass costly processing stages.

03

Maintains minimal overhead even with many adversarial examples.

Abstract

Large Vision-Language Models (LVLMs) can be vulnerable to adversarial images that subtly bias their outputs toward plausible yet incorrect responses. We introduce a general, efficient, and training-free defense that combines image transformations with agentic data consolidation to recover correct model behavior. A key component of our approach is a two-stage detection mechanism that quickly filters out the majority of clean inputs. We first assess image consistency under content-preserving transformations at negligible computational cost. For more challenging cases, we examine discrepancies in a text-embedding space. Only when necessary do we invoke a powerful LLM to resolve attack-induced divergences. A key idea is to consolidate multiple responses, leveraging both their similarities and their differences. We show that our method achieves state-of-the-art accuracy while maintaining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Hate Speech and Cyberbullying Detection