AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Shouwei Ruan, Hanqing Liu, Yao Huang, Xiaoqi Wang, Caixin Kang, Hang, Su, Yinpeng Dong, Xingxing Wei

TL;DR
This paper introduces AdvDreamer, a framework for generating adversarial 3D transformations to evaluate the robustness of vision-language models in real-world scenarios, revealing significant vulnerabilities.
Contribution
AdvDreamer is the first system to produce physically reproducible adversarial 3D samples from single views, enabling systematic robustness evaluation of VLMs against real-world 3D variations.
Findings
VLMs are vulnerable to real-world 3D variations.
AdvDreamer generates high-quality adversarial samples.
Robustness gaps are identified across different VLM architectures.
Abstract
Vision Language Models (VLMs) have exhibited remarkable generalization capabilities, yet their robustness in dynamic real-world scenarios remains largely unexplored. To systematically evaluate VLMs' robustness to real-world 3D variations, we propose AdvDreamer, the first framework capable of generating physically reproducible Adversarial 3D Transformation (Adv-3DT) samples from single-view observations. In AdvDreamer, we integrate three key innovations: Firstly, to characterize real-world 3D variations with limited prior knowledge precisely, we design a zero-shot Monocular Pose Manipulation pipeline built upon generative 3D priors. Secondly, to ensure the visual quality of worst-case Adv-3DT samples, we propose a Naturalness Reward Model that provides continuous naturalness regularization during adversarial optimization, effectively preventing convergence to hallucinated or unnatural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Semantic Web and Ontologies
