TL;DR
RedCoder is an automated multi-turn red-teaming framework that interacts with code generation models to identify vulnerabilities, using a multi-agent game to develop attack strategies and fine-tuning an LLM for dynamic adversarial conversations.
Contribution
This paper introduces RedCoder, a novel multi-turn red-teaming approach that automates vulnerability detection in Code LLMs through interactive conversations and strategy reuse.
Findings
RedCoder outperforms prior methods in inducing vulnerabilities.
It effectively automates multi-turn interactions for security testing.
The approach is scalable and adaptable across different Code LLMs.
Abstract
Large Language Models (LLMs) for code generation (i.e., Code LLMs) have demonstrated impressive capabilities in AI-assisted software development and testing. However, recent studies have shown that these models are prone to generating vulnerable or even malicious code under adversarial settings. Existing red-teaming approaches rely on extensive human effort, limiting their scalability and practicality, and generally overlook the interactive nature of real-world AI-assisted programming, which often unfolds over multiple turns. To bridge these gaps, we present RedCoder, a red-teaming agent that engages victim models in multi-turn conversation to elicit vulnerable code. The pipeline to construct RedCoder begins with a multi-agent gaming process that simulates adversarial interactions, yielding a set of prototype conversations and an arsenal of reusable attack strategies. We then fine-tune…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
