Defending Code Language Models against Backdoor Attacks with Deceptive Cross-Entropy Loss

Guang Yang; Yu Zhou; Xiang Chen; Xiangyu Zhang; Terry Yue Zhuo; David Lo; Taolue Chen

arXiv:2407.08956·cs.CR·May 20, 2025·2 cites

Defending Code Language Models against Backdoor Attacks with Deceptive Cross-Entropy Loss

Guang Yang, Yu Zhou, Xiang Chen, Xiangyu Zhang, Terry Yue Zhuo, David Lo, Taolue Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces DeCE, a novel loss function that uses deceptive distributions and label smoothing to prevent overfitting to backdoor triggers in code language models, thereby enhancing their security.

Contribution

The paper proposes DeCE, a new loss function that effectively defends code language models against backdoor attacks by addressing overfitting issues caused by cross-entropy.

Findings

01

DeCE significantly reduces backdoor vulnerability in CLMs.

02

Overfitting to backdoor triggers is linked to unbounded cross-entropy loss.

03

DeCE's bounded gradient limits overfitting and improves security.

Abstract

Code Language Models (CLMs), particularly those leveraging deep learning, have achieved significant success in code intelligence domain. However, the issue of security, particularly backdoor attacks, is often overlooked in this process. The previous research has focused on designing backdoor attacks for CLMs, but effective defenses have not been adequately addressed. In particular, existing defense methods from natural language processing, when directly applied to CLMs, are not effective enough and lack generality, working well in some models and scenarios but failing in others, thus fall short in consistently mitigating backdoor attacks. To bridge this gap, we first confirm the phenomenon of "early learning" as a general occurrence during the training of CLMs. This phenomenon refers to that a model initially focuses on the main features of training data but may become more sensitive to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zrw00/graceful
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Advancements in Semiconductor Devices and Circuit Design