RESCUE: Retrieval Augmented Secure Code Generation

Jiahao Shi; Tianyi Zhang

arXiv:2510.18204·cs.CR·March 17, 2026

RESCUE: Retrieval Augmented Secure Code Generation

Jiahao Shi, Tianyi Zhang

PDF

Open Access 3 Reviews

TL;DR

RESCUE is a novel retrieval-augmented framework that enhances secure code generation by constructing a hierarchical security knowledge base and employing multi-faceted retrieval, significantly improving security performance across multiple benchmarks.

Contribution

It introduces a hybrid knowledge base construction and hierarchical retrieval method, addressing noise and security semantics in secure code generation with LLMs.

Findings

01

Rescue improves SecurePass@1 by an average of 4.8 points.

02

Achieves state-of-the-art performance on security benchmarks.

03

Validated through extensive ablation studies.

Abstract

Despite recent advances, Large Language Models (LLMs) still generate vulnerable code. Retrieval-Augmented Generation (RAG) has the potential to enhance LLMs for secure code generation by incorporating external security knowledge. However, the conventional RAG design struggles with the noise of raw security-related documents, and existing retrieval methods overlook the significant security semantics implicitly embedded in task descriptions. To address these issues, we propose \textsc{Rescue}, a new RAG framework for secure code generation with two key innovations. First, we propose a hybrid knowledge base construction method that combines LLM-assisted cluster-then-summarize distillation with program slicing, producing both high-level security guidelines and concise, security-focused code examples. Second, we design a hierarchical multi-faceted retrieval that traverses the constructed…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

- Core approach is technically sound with clear motivation - Comprehensive experiments (4 benchmarks, 6 LLMs, 5 baselines) - Ablation studies validating the main components - The paper is generally well-organized structurally and is easy to understand

Weaknesses

- The novelty of the contribution seems to be in applying the different components like cluster-and-summarize knowledge base with heirarchical retrieval. The method involves too many moving components like API pattern and vulnerability cause analysis. The gains based on the complexity of the system seems not much significant, which would limit the adoption of such an approach for secure code generation. - The method involves many different hyperparameters like hop limit, thresholds for api and v

Reviewer 02Rating 8Confidence 4

Strengths

- RESCUE consistently outperforms all existing methods across multiple benchmarks in terms of SecurePass@1, a comprehensive metric that jointly evaluates both functionality and security. This demonstrates the framework's ability to generate code that is not only correct but also resistant to common vulnerabilities, representing a significant advancement over prior approaches that often sacrifice one dimension for the other. - The paper proposes a novel and systematic method for building a refine

Weaknesses

- The evaluation framework used in the paper, which relies on static security analysis tools, presents a key limitation: The evaluation framework employs static security analysis tools, which are known to potentially generate false positives and negatives - RESCUE introduces some additional time cost to achieve its security improvements. Although the paper suggest the overhead is acceptable and can be further reduced through engineering optimizations, the additional costs are mostly unclear

Reviewer 03Rating 6Confidence 4

Strengths

- This paper focuses on addressing an important question - This paper's results has shown substantially improvement compared with baselines - This paper design a comprehensive retrieval approach for related vulnerabilities

Weaknesses

Generally, the paper tries to address an important security problem while some concerns about the evaluation setting exist. 1.1 Evaluation Benchmark. The main concern of this paper is that the evaluation benchmarks are programming-contest benchmarks (HE, BCB, LCB). These benchmarks are mainly self-contained, and mostly function-level. Avoiding vulnerabilities in these benchmarks are not convincing and this paper can be substantially improved after including real-world code-generation/-completio

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Digital and Cyber Forensics