ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants

Xiangzhe Xu; Guangyu Shen; Zian Su; Siyuan Cheng; Hanxi Guo; Lu Yan; Xuan Chen; Jiasheng Jiang; Xiaolong Jin; Chengpeng Wang; Zhuo Zhang; Xiangyu Zhang

arXiv:2508.03936·cs.CR·August 7, 2025

ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants

Xiangzhe Xu, Guangyu Shen, Zian Su, Siyuan Cheng, Hanxi Guo, Lu Yan, Xuan Chen, Jiasheng Jiang, Xiaolong Jin, Chengpeng Wang, Zhuo Zhang, Xiangyu Zhang

PDF

TL;DR

ASTRA is an automated system that systematically uncovers safety flaws in AI code assistants by exploring complex software tasks and generating realistic vulnerability cases, improving safety testing effectiveness.

Contribution

ASTRA introduces a novel three-stage approach combining knowledge graphs and adaptive probing to identify vulnerabilities in AI code assistants more effectively than prior methods.

Findings

01

Finds 11-66% more issues than existing techniques.

02

Generates test cases that improve model alignment by 17%.

03

Demonstrates practical value for safer AI systems.

Abstract

AI coding assistants like GitHub Copilot are rapidly transforming software development, but their safety remains deeply uncertain-especially in high-stakes domains like cybersecurity. Current red-teaming tools often rely on fixed benchmarks or unrealistic prompts, missing many real-world vulnerabilities. We present ASTRA, an automated agent system designed to systematically uncover safety flaws in AI-driven code generation and security guidance systems. ASTRA works in three stages: (1) it builds structured domain-specific knowledge graphs that model complex software tasks and known weaknesses; (2) it performs online vulnerability exploration of each target model by adaptively probing both its input space, i.e., the spatial exploration, and its reasoning processes, i.e., the temporal exploration, guided by the knowledge graphs; and (3) it generates high-quality violation-inducing cases…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.