Explainability-Guided Adversarial Attacks on Transformer-Based Malware Detectors Using Control Flow Graphs
Andrew Wheeler, Kshitiz Aryal, Maanak Gupta

TL;DR
This paper reveals vulnerabilities in transformer-based malware detectors that analyze control flow graphs, demonstrating how explainability tools can be exploited to craft effective adversarial evasion attacks.
Contribution
It introduces a white-box attack method using explainability techniques to identify and perturb influential graph components, exposing security weaknesses in transformer-based malware detection.
Findings
The attack reliably induces misclassification in malware detectors.
Explainability tools can be exploited to find critical attack surfaces.
Transformer-based detectors are vulnerable despite high accuracy.
Abstract
Transformer-based malware detection systems operating on graph modalities such as control flow graphs (CFGs) achieve strong performance by modeling structural relationships in program behavior. However, their robustness to adversarial evasion attacks remains underexplored. This paper examines the vulnerability of a RoBERTa-based malware detector that linearizes CFGs into sequences of function calls, a design choice that enables transformer modeling but may introduce token-level sensitivities and ordering artifacts exploitable by adversaries. By evaluating evasion strategies within this graph-to-sequence framework, we provide insight into the practical robustness of transformer-based malware detectors beyond aggregate detection accuracy. This paper proposes a white-box adversarial evasion attack that leverages explainability mechanisms to identify and perturb most influential graph…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
