GUARD:Dual-Agent based Backdoor Defense on Chain-of-Thought in Neural Code Generation
Naizhu Jin, Zhong Li, Tian Zhang, Qingkai Zeng

TL;DR
GUARD is a dual-agent framework that detects and repairs backdoor attacks in Chain-of-Thought models used for neural code generation, enhancing security without compromising performance.
Contribution
It introduces a novel dual-agent defense system specifically designed to identify and mitigate backdoor attacks in Chain-of-Thought models for code generation.
Findings
Effectively detects backdoor triggers in CoT models
Successfully mitigates backdoor attacks while maintaining code quality
Outperforms existing defenses in experimental evaluations
Abstract
With the widespread application of large language models in code generation, recent studies demonstrate that employing additional Chain-of-Thought generation models can significantly enhance code generation performance by providing explicit reasoning steps. However, as external components, CoT models are particularly vulnerable to backdoor attacks, which existing defense mechanisms often fail to detect effectively. To address this challenge, we propose GUARD, a novel dual-agent defense framework specifically designed to counter CoT backdoor attacks in neural code generation. GUARD integrates two core components: GUARD-Judge, which identifies suspicious CoT steps and potential triggers through comprehensive analysis, and GUARD-Repair, which employs a retrieval-augmented generation approach to regenerate secure CoT steps for identified anomalies. Experimental results show that GUARD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
