Inference-Time Safety For Code LLMs Via Retrieval-Augmented Revision
Manisha Mukherjee, Vincent J. Hellendoorn

TL;DR
This paper introduces a retrieval-augmented revision method for code LLMs that enhances safety and trustworthiness by surfacing relevant security risks and guiding code revision during inference, without retraining.
Contribution
The paper proposes a novel inference-time safety mechanism using retrieval-augmented generation to improve code security and interpretability in LLMs.
Findings
Improves security of generated code compared to prompting alone
No new vulnerabilities introduced as per static analysis
Enhances robustness to evolving security standards
Abstract
Large Language Models (LLMs) are increasingly deployed for code generation in high-stakes software development, yet their limited transparency in security reasoning and brittleness to evolving vulnerability patterns raise critical trustworthiness concerns. Models trained on static datasets cannot readily adapt to newly discovered vulnerabilities or changing security standards without retraining, leading to the repeated generation of unsafe code. We present a principled approach to trustworthy code generation by design that operates as an inference-time safety mechanism. Our approach employs retrieval-augmented generation to surface relevant security risks in generated code and retrieve related security discussions from a curated Stack Overflow knowledge base, which are then used to guide an LLM during code revision. This design emphasizes three aspects relevant to trustworthiness: (1)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Adversarial Robustness in Machine Learning · Scientific Computing and Data Management
