Exploring the Security Threats of Retriever Backdoors in Retrieval-Augmented Code Generation
Tian Li, Bo Lin, Shangwen Wang, Yusong Tan

TL;DR
This paper reveals that retriever backdoors in Retrieval-Augmented Code Generation pose a serious security threat, as they can be stealthily injected and exploited to produce vulnerable code at scale, bypassing current defenses.
Contribution
The authors introduce VenomRACG, a novel stealthy attack method, and demonstrate its effectiveness in exposing practical vulnerabilities in retrieval-augmented code generation systems.
Findings
Injected code as small as 0.05% of the knowledge base can manipulate retriever results.
Backdoored retrievers can cause models to generate vulnerable code in over 40% of cases.
Current defenses are ineffective against the proposed stealthy backdoor attacks.
Abstract
Retrieval-Augmented Code Generation (RACG) is increasingly adopted to enhance Large Language Models for software development, yet its security implications remain dangerously underexplored. This paper conducts the first systematic exploration of a critical and stealthy threat: backdoor attacks targeting the retriever component, which represents a significant supply-chain vulnerability. It is infeasible to assess this threat realistically, as existing attack methods are either too ineffective to pose a real danger or are easily detected by state-of-the-art defense mechanisms spanning both latent-space analysis and token-level inspection, which achieve consistently high detection rates. To overcome this barrier and enable a realistic analysis, we first developed VenomRACG, a new class of potent and stealthy attack that serves as a vehicle for our investigation. Its design makes poisoned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Adversarial Robustness in Machine Learning · Scientific Computing and Data Management
