Generating End-to-End Adversarial Examples for Malware Classifiers Using Explainability
Ishai Rosenberg, Shai Meir, Jonathan Berrebi, Ilay Gordon and, Guillaume Sicard, Eli David

TL;DR
This paper demonstrates how explainable machine learning techniques can be exploited by adversaries to craft more effective malware evasion attacks, revealing vulnerabilities in interpretable classifiers.
Contribution
It introduces a novel two-step adversarial attack method leveraging explainability to identify and modify important features, and explores transferability of explainability across models.
Findings
Explainability algorithms can be exploited for adversarial attacks.
Feature importance can guide targeted modifications.
Transferability of explainability aids black-box attacks.
Abstract
In recent years, the topic of explainable machine learning (ML) has been extensively researched. Up until now, this research focused on regular ML users use-cases such as debugging a ML model. This paper takes a different posture and show that adversaries can leverage explainable ML to bypass multi-feature types malware classifiers. Previous adversarial attacks against such classifiers only add new features and not modify existing ones to avoid harming the modified malware executable's functionality. Current attacks use a single algorithm that both selects which features to modify and modifies them blindly, treating all features the same. In this paper, we present a different approach. We split the adversarial example generation task into two parts: First we find the importance of all features for a specific sample using explainability algorithms, and then we conduct a feature-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
