Benchmarking Autonomous Agents against Temporal, Spatial, and Semantic Evasions

Jianan Ma; Xiaohu Du; Ruixiao Lin; Yaoxiang Bian; Jialuo Chen; Jingyi Wang; Xiaofang Yang; Shiwen Cui; Changhua Meng; Xinhao Deng; Zhen Wang

arXiv:2605.22321·cs.CR·May 22, 2026

Benchmarking Autonomous Agents against Temporal, Spatial, and Semantic Evasions

Jianan Ma, Xiaohu Du, Ruixiao Lin, Yaoxiang Bian, Jialuo Chen, Jingyi Wang, Xiaofang Yang, Shiwen Cui, Changhua Meng, Xinhao Deng, Zhen Wang

PDF

TL;DR

This paper introduces a multi-dimensional evasion framework and benchmark to evaluate vulnerabilities in LLM-based autonomous agents, revealing significant security risks overlooked by current defenses.

Contribution

It presents a novel multi-dimensional evasion framework and a comprehensive benchmark, exposing systemic vulnerabilities in autonomous agents against sophisticated attack vectors.

Findings

01

Evasion framework increases risk trigger rate from 28.3% to 52.6%.

02

Systemic vulnerabilities exist in current autonomous agent architectures.

03

Existing defenses are insufficient against multi-turn, multi-vector attacks.

Abstract

As autonomous agents (e.g., OpenClaw) increasingly operate with deep system-level privileges to execute complex tasks, they introduce severe, unmitigated security risks. Current vulnerability analyses overwhelmingly focus on single-turn, stateless behaviors, overlooking the expanded attack surface inherent in stateful, multi-turn interactions and dynamic tool invocations. In this paper, we propose a novel, multi-dimensional evasion framework targeting LLM-based agent systems. We introduce three stealthy attack vectors: (1) Temporal evasion, which fragments malicious payloads across sequential interaction turns; (2) Spatial evasion, which conceals payloads within complex external artifacts that evade standard LLM parsing mechanisms; and (3) Semantic evasion, which obscures malicious intents beneath benign contextual noise. To systematically quantify these threats, we construct A3S-Bench,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.