Adversarial Intent is a Latent Variable: Stateful Trust Inference for Securing Multimodal Agentic RAG

Inderjeet Singh; Vikas Pahuja; Aishvariya Priya Rathina Sabapathy; Chiara Picardi; Amit Giloni; Roman Vainshtein; Andr\'es Murillo; Hisashi Kojima; Motoyoshi Sekiya; Yuki Unno; Junichi Suga

arXiv:2602.21447·cs.CR·February 26, 2026

Adversarial Intent is a Latent Variable: Stateful Trust Inference for Securing Multimodal Agentic RAG

Inderjeet Singh, Vikas Pahuja, Aishvariya Priya Rathina Sabapathy, Chiara Picardi, Amit Giloni, Roman Vainshtein, Andr\'es Murillo, Hisashi Kojima, Motoyoshi Sekiya, Yuki Unno, Junichi Suga

PDF

Open Access

TL;DR

This paper introduces a stateful trust inference framework for multimodal agentic RAG systems, using a POMDP model and a Modular Trust Agent to detect adversarial intent, significantly reducing attack success rates with minimal utility loss.

Contribution

It formulates adversarial intent detection as a POMDP and proposes MMA-RAG^T, a model-agnostic, stateful control framework with structured LLM reasoning for enhanced security.

Findings

01

6.50x reduction in attack success rate

02

Statefulness and spatial coverage are necessary for effectiveness

03

Stateless filtering offers negligible benefits when detections are correlated

Abstract

Current stateless defences for multimodal agentic RAG fail to detect adversarial strategies that distribute malicious semantics across retrieval, planning, and generation components. We formulate this security challenge as a Partially Observable Markov Decision Process (POMDP), where adversarial intent is a latent variable inferred from noisy multi-stage observations. We introduce MMA-RAG^T, an inference-time control framework governed by a Modular Trust Agent (MTA) that maintains an approximate belief state via structured LLM reasoning. Operating as a model-agnostic overlay, MMA-RAGT mediates a configurable set of internal checkpoints to enforce stateful defence-in-depth. Extensive evaluation on 43,774 instances demonstrates a 6.50x average reduction factor in Attack Success Rate relative to undefended baselines, with negligible utility cost. Crucially, a factorial ablation validates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks