Beyond the Wrapper: Identifying Artifact Reliance in Static Malware Classifiers using TRUSTEE
Riyazuddin Mohammed, Lan Zhang

TL;DR
This paper introduces a framework using TRUSTEE to identify artifact reliance in static malware classifiers, revealing their sensitivity to dataset biases and artifacts like packing rather than true malicious behavior.
Contribution
The authors propose a novel two-part interpretability framework to diagnose and understand artifact reliance in static malware classifiers, improving model robustness.
Findings
Top features are packing artifacts and PE metadata, not malicious semantics.
Malware classifiers are highly sensitive to dataset composition.
The framework enables reproducible diagnosis of classifier biases.
Abstract
Modern cybersecurity relies heavily on static machine-learning-based malware classifiers. However, transformations such as packing and other non-semantic modifications applied to executable files limit their reliability. Malware classifiers often learn these unnecessary artifacts rather than the true binary behavior because of the high association between maliciousness and packing. Moreover, these malware classifiers are black boxes, making it difficult to understand what they learn. To address this issue, we proposed a two-part framework using the post-hoc interpretability XAI tool TRUSTEE, followed by a manual analysis of the top features. We conducted several controlled experiments by varying the dataset composition ratios to understand their impact on the results. The top-ranked features across all experiments, identified by TRUSTEE, were predominantly packing artifacts, portable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
