A Survey on Adversarial Machine Learning for Code Data: Realistic   Threats, Countermeasures, and Interpretations

Yulong Yang; Haoran Fan; Chenhao Lin; Qian Li; Zhengyu Zhao; Chao; Shen; Xiaohong Guan

arXiv:2411.07597·cs.CR·November 13, 2024

A Survey on Adversarial Machine Learning for Code Data: Realistic Threats, Countermeasures, and Interpretations

Yulong Yang, Haoran Fan, Chenhao Lin, Qian Li, Zhengyu Zhao, Chao, Shen, Xiaohong Guan

PDF

Open Access

TL;DR

This survey comprehensively analyzes security vulnerabilities of Code Language Models (CLMs), categorizing attack types, examining countermeasures, and highlighting future research directions to enhance their reliability in real-world applications.

Contribution

It provides the most extensive review of adversarial attacks on CLMs, categorizes threats based on the CIA triad, and introduces novel perspectives on explainable AI and interconnected risks.

Findings

01

Identified 79 relevant papers on adversarial ML for CLMs.

02

Categorized attacks into poisoning, evasion, and privacy types.

03

Discussed countermeasures and future research challenges.

Abstract

Code Language Models (CLMs) have achieved tremendous progress in source code understanding and generation, leading to a significant increase in research interests focused on applying CLMs to real-world software engineering tasks in recent years. However, in realistic scenarios, CLMs are exposed to potential malicious adversaries, bringing risks to the confidentiality, integrity, and availability of CLM systems. Despite these risks, a comprehensive analysis of the security vulnerabilities of CLMs in the extremely adversarial environment has been lacking. To close this research gap, we categorize existing attack techniques into three types based on the CIA triad: poisoning attacks (integrity \& availability infringement), evasion attacks (integrity infringement), and privacy attacks (confidentiality infringement). We have collected so far the most comprehensive (79) papers related to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Cryptographic Implementations and Security