TRAP: Targeted Redirecting of Agentic Preferences

Hangoo Kang; Jehyeok Yeon; Gagandeep Singh

arXiv:2505.23518·cs.AI·November 25, 2025

TRAP: Targeted Redirecting of Agentic Preferences

Hangoo Kang, Jehyeok Yeon, Gagandeep Singh

PDF

Open Access 1 Video

TL;DR

TRAP introduces a diffusion-based semantic injection framework that subtly manipulates vision-language models, exposing vulnerabilities in autonomous agent decision-making without requiring internal model access.

Contribution

It presents a novel, model-agnostic adversarial method that exploits semantic vulnerabilities in cross-modal AI systems using naturalistic images.

Findings

01

TRAP effectively redirects agent preferences across multiple models.

02

It outperforms existing adversarial baselines in inducing decision biases.

03

The method reveals a significant vulnerability in current vision-language AI systems.

Abstract

Autonomous agentic AI systems powered by vision-language models (VLMs) are rapidly advancing toward real-world deployment, yet their cross-modal reasoning capabilities introduce new attack surfaces for adversarial manipulation that exploit semantic reasoning across modalities. Existing adversarial attacks typically rely on visible pixel perturbations or require privileged model or environment access, making them impractical for stealthy, real-world exploitation. We introduce TRAP, a novel generative adversarial framework that manipulates the agent's decision-making using diffusion-based semantic injections into the vision-language embedding space. Our method combines negative prompt-based degradation with positive semantic optimization, guided by a Siamese semantic network and layout-aware spatial masking. Without requiring access to model internals, TRAP produces visually natural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

TRAP: Targeted Redirecting of Agentic Preferences· slideslive

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Ethics and Social Impacts of AI

MethodsDiffusion