Structural Representations for Cross-Attack Generalization in AI Agent Threat Detection
Vignesh Iyer

TL;DR
This paper introduces structural tokenization to improve cross-attack generalization in AI agent threat detection by encoding execution-flow patterns, significantly enhancing detection of unseen structural attacks.
Contribution
The paper proposes a novel structural tokenization method that captures execution-flow patterns, substantially improving cross-attack detection performance over standard linguistic tokenization.
Findings
Structural tokenization improves AUC by up to 71 points on unknown attacks.
It enhances detection of structural attacks like tool hijacking and data exfiltration.
Combining structural and linguistic features achieves high accuracy on social engineering attacks.
Abstract
Autonomous AI agents executing multi-step tool sequences face semantic attacks that manifest in behavioral traces rather than isolated prompts. A critical challenge is cross-attack generalization: can detectors trained on known attack families recognize novel, unseen attack types? We discover that standard conversational tokenization -- capturing linguistic patterns from agent interactions -- fails catastrophically on structural attacks like tool hijacking (AUC 0.39) and data exfiltration (AUC 0.46), while succeeding on linguistic attacks like social engineering (AUC 0.78). We introduce structural tokenization, encoding execution-flow patterns (tool calls, arguments, observations) rather than conversational content. This simple representational change dramatically improves cross-attack generalization: +46 AUC points on tool hijacking, +39 points on data exfiltration, and +71 points on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)
