A Survey on Adversarial Machine Learning for Code Data: Realistic Threats, Countermeasures, and Interpretations
Yulong Yang, Haoran Fan, Chenhao Lin, Qian Li, Zhengyu Zhao, Chao, Shen, Xiaohong Guan

TL;DR
This survey comprehensively analyzes security vulnerabilities of Code Language Models (CLMs), categorizing attack types, examining countermeasures, and highlighting future research directions to enhance their reliability in real-world applications.
Contribution
It provides the most extensive review of adversarial attacks on CLMs, categorizes threats based on the CIA triad, and introduces novel perspectives on explainable AI and interconnected risks.
Findings
Identified 79 relevant papers on adversarial ML for CLMs.
Categorized attacks into poisoning, evasion, and privacy types.
Discussed countermeasures and future research challenges.
Abstract
Code Language Models (CLMs) have achieved tremendous progress in source code understanding and generation, leading to a significant increase in research interests focused on applying CLMs to real-world software engineering tasks in recent years. However, in realistic scenarios, CLMs are exposed to potential malicious adversaries, bringing risks to the confidentiality, integrity, and availability of CLM systems. Despite these risks, a comprehensive analysis of the security vulnerabilities of CLMs in the extremely adversarial environment has been lacking. To close this research gap, we categorize existing attack techniques into three types based on the CIA triad: poisoning attacks (integrity \& availability infringement), evasion attacks (integrity infringement), and privacy attacks (confidentiality infringement). We have collected so far the most comprehensive (79) papers related to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Cryptographic Implementations and Security
