BlueCodeAgent: A Blue Teaming Agent Enabled by Automated Red Teaming for CodeGen AI
Chengquan Guo, Yuzhou Nie, Chulin Xie, Zinan Lin, Wenbo Guo, Bo Li

TL;DR
BlueCodeAgent is a comprehensive blue teaming framework that leverages automated red teaming to improve the detection of unsafe code generated by large language models, significantly enhancing safety measures.
Contribution
The paper introduces BlueCodeAgent, a novel end-to-end blue teaming system that integrates automated red teaming to improve detection of unsafe code scenarios in LLMs.
Findings
Achieves 12.7% average F1 score improvement across tasks.
Effectively reduces false positives in vulnerable code detection.
Demonstrates continuous improvement through red teaming feedback.
Abstract
As large language models (LLMs) are increasingly used for code generation, concerns over the security risks have grown substantially. Early research has primarily focused on red teaming, which aims to uncover and evaluate vulnerabilities and risks of CodeGen models. However, progress on the blue teaming side remains limited, as developing defense requires effective semantic understanding to differentiate the unsafe from the safe. To fill in this gap, we propose BlueCodeAgent, an end-to-end blue teaming agent enabled by automated red teaming. Our framework integrates both sides: red teaming generates diverse risky instances, while the blue teaming agent leverages these to detect previously seen and unseen risk scenarios through constitution and code analysis with agentic integration for multi-level defense. Our evaluation across three representative code-related tasks--bias instruction…
Peer Reviews
Decision·Submitted to ICLR 2026
- BlueCodeAgent achieves significant gains over the baseline models and safety prompt-based defenses, demonstrating much more effective and context-aware risk detection and mitigation. It consistently performs well on both seen and unseen risks - Red-teaming can empower effective blue-teaming defenses, showing that red teaming benefits blue teaming by continuously identifying new vulnerabilities
- The proposed methods are mostly based on prompt engineering and the technical contribution is therefore limited for this venue - The definition of blue teaming is not presented in the paper. It is only clear from the context, but I would recommend to add a clear defintion early in the paper to show the contribution - Limited Scope of Risk Categories: The current evaluation focuses on three representative code-related tasks: bias instruction detection, malicious instruction detection, and vulne
- The paper presents a novel perspective on connecting red teaming and blue teaming for code security. The idea of distilling red teaming knowledge into actionable constitutions for defense is creative. The integration of dynamic testing with LLM-based static analysis for vulnerability detection is a practical contribution that addresses the over-conservatism problem identified in prior work. - The paper is well-structured and clearly written. Figure 2 provides a helpful overview of the framewor
- The evaluation covers only three risk categories (bias, malicious code, vulnerable code). Many other security concerns exist in code generation (e.g., privacy leaks, intellectual property violations, supply chain attacks). The "unseen risks" evaluation (Section 4) tests on different sub-categories within the same high-level risk type (e.g., different CWE types). True generalization to fundamentally different attack types remains unclear. Table 2 shows performance drops when moving from seen to
- Integrating comprehensive automated red teaming with knowledge-enhanced blue teaming agents is an effective defense method and possesses novelty. - This paper conducts a comprehensive evaluation of three benchmarks (bias, toxicity, and code vulnerability risks) and reports results across visible/invisible risk categories, multiple base models, and various prompt configurations, demonstrating an extensive experimental scope.
- BlueCodeAgent relies to some extent on the knowledge base constructed by automated red teaming, but the red teaming methods used are limited. This seems insufficient to cover all harmful categories and red teaming strategies. How does BlueCodeAgent handle cases that are not included in the knowledge base? - BlueCodeAgent summarizes “constitutions” based on closest-matching knowledge base entries found using embedding search. However, this means blue teaming effectiveness could, in part, inheri
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Software Engineering Research
