TL;DR
This paper identifies the challenge of trust boundary confusion in vision-language agents due to misleading visual signals, and proposes a defense framework to improve robustness against such injections.
Contribution
It introduces a dual-intent dataset and evaluation framework, and proposes a multi-agent defense system to mitigate visual injection vulnerabilities in embodied vision-language agents.
Findings
Current LVLM agents often ignore useful signals or follow harmful ones.
The proposed defense significantly reduces misleading behaviors.
The evaluation framework and code are publicly available.
Abstract
Recent advances in embodied Vision-Language Agentic Systems (VLAS), powered by large vision-language models (LVLMs), enable AI systems to perceive and reason over real-world scenes. Within this context, environmental signals such as traffic lights are essential in-band signals that can and should influence agent behavior. However, similar signals could also be crafted to operate as misleading visual injections, overriding user intent and posing security risks. This duality creates a fundamental challenge: agents must respond to legitimate environmental cues while remaining robust to misleading ones. We refer to this tension as trust boundary confusion. To study this behavior, we design a dual-intent dataset and evaluation framework, through which we show that current LVLM-based agents fail to reliably balance this trade-off, either ignoring useful signals or following harmful ones. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
