A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

Keke Lian; Bin Wang; Lei Zhang; Libo Chen; Junjie Wang; Ziming Zhao; Yujiu Yang; Miaoqian Lin; Haotong Duan; Haoran Zhao; Shuang Liao; Mingda Guo; Jiazheng Quan; Yilu Zhong; Chenhao He; Zichuan Chen; Jie Wu; Haoling Li; Zhaoxuan Li; Jiongchi Yu; Hui Li; Dong Zhang

arXiv:2508.18106·cs.SE·September 19, 2025

A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

Keke Lian, Bin Wang, Lei Zhang, Libo Chen, Junjie Wang, Ziming Zhao, Yujiu Yang, Miaoqian Lin, Haotong Duan, Haoran Zhao, Shuang Liao, Mingda Guo, Jiazheng Quan, Yilu Zhong, Chenhao He, Zichuan Chen, Jie Wu, Haoling Li, Zhaoxuan Li, Jiongchi Yu, Hui Li, Dong Zhang

PDF

1 Datasets

TL;DR

This paper introduces A.S.E, a comprehensive benchmark for evaluating the security of AI-generated code at the repository level, addressing the gap in real-world applicability of existing benchmarks.

Contribution

The paper presents A.S.E, a novel repository-level benchmark that better reflects real-world AI programming scenarios for security evaluation of generated code.

Findings

01

Current LLMs struggle with secure coding in repository scenarios.

02

Larger reasoning budgets do not improve code security.

03

Repository complexity challenges LLM performance.

Abstract

The increasing adoption of large language models (LLMs) in software engineering necessitates rigorous security evaluation of their generated code. However, existing benchmarks often lack relevance to real-world AI-assisted programming scenarios, making them inadequate for assessing the practical security risks associated with AI-generated code in production environments. To address this gap, we introduce A.S.E (AI Code Generation Security Evaluation), a repository-level evaluation benchmark designed to closely mirror real-world AI programming tasks, offering a comprehensive and reliable framework for assessing the security of AI-generated code. Our evaluation of leading LLMs on A.S.E reveals several key findings. In particular, current LLMs still struggle with secure coding. The complexity in repository-level scenarios presents challenges for LLMs that typically perform well on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

tencent/A.S.E
dataset· 19 dl
19 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.