Best-Effort Adversarial Approximation of Black-Box Malware Classifiers
Abdullah Ali, Birhanu Eshete

TL;DR
This paper presents a method for approximating black-box malware classifiers using limited queries, feature mapping, and transferability, achieving high accuracy and prediction agreement with the original models.
Contribution
It introduces a novel best-effort adversarial approximation approach that works with minimal knowledge and limited data, effectively mimicking black-box malware classifiers.
Findings
Achieves 92% accuracy in approximating CNN-based classifiers.
Reaches nearly 90% prediction agreement with target models.
Effective even with limited input data and non-overlapping training sets.
Abstract
An adversary who aims to steal a black-box model repeatedly queries the model via a prediction API to learn a function that approximates its decision boundary. Adversarial approximation is non-trivial because of the enormous combinations of model architectures, parameters, and features to explore. In this context, the adversary resorts to a best-effort strategy that yields the closest approximation. This paper explores best-effort adversarial approximation of a black-box malware classifier in the most challenging setting, where the adversary's knowledge is limited to a prediction label for a given input. Beginning with a limited input set for the black-box classifier, we leverage feature representation mapping and cross-domain transferability to approximate a black-box malware classifier by locally training a substitute. Our approach approximates the target model with different feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
