GUARD:Dual-Agent based Backdoor Defense on Chain-of-Thought in Neural Code Generation

Naizhu Jin; Zhong Li; Tian Zhang; Qingkai Zeng

arXiv:2505.21425·cs.SE·August 13, 2025

GUARD:Dual-Agent based Backdoor Defense on Chain-of-Thought in Neural Code Generation

Naizhu Jin, Zhong Li, Tian Zhang, Qingkai Zeng

PDF

TL;DR

GUARD is a dual-agent framework that detects and repairs backdoor attacks in Chain-of-Thought models used for neural code generation, enhancing security without compromising performance.

Contribution

It introduces a novel dual-agent defense system specifically designed to identify and mitigate backdoor attacks in Chain-of-Thought models for code generation.

Findings

01

Effectively detects backdoor triggers in CoT models

02

Successfully mitigates backdoor attacks while maintaining code quality

03

Outperforms existing defenses in experimental evaluations

Abstract

With the widespread application of large language models in code generation, recent studies demonstrate that employing additional Chain-of-Thought generation models can significantly enhance code generation performance by providing explicit reasoning steps. However, as external components, CoT models are particularly vulnerable to backdoor attacks, which existing defense mechanisms often fail to detect effectively. To address this challenge, we propose GUARD, a novel dual-agent defense framework specifically designed to counter CoT backdoor attacks in neural code generation. GUARD integrates two core components: GUARD-Judge, which identifies suspicious CoT steps and potential triggers through comprehensive analysis, and GUARD-Repair, which employs a retrieval-augmented generation approach to regenerate secure CoT steps for identified anomalies. Experimental results show that GUARD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.