Demystifying the Role of Rule-based Detection in AI Systems for Windows Malware Detection
Andrea Ponte, Luca Demetrio, Luca Oneto, Ivan Tesfai Ogbu, Battista Biggio, Fabio Roli

TL;DR
This paper investigates how integrating signature-based detection with machine learning affects malware detection robustness, showing that combined training improves resistance to adversarial examples and data drift but introduces false positive limitations.
Contribution
It reveals the impact of signature-based detection on training AI models for malware detection and discusses how this integration enhances robustness against adversarial attacks.
Findings
Improved robustness to adversarial EXEmples
Enhanced resistance to temporal data drift
Trade-off with increased false positives
Abstract
Malware detection increasingly relies on AI systems that integrate signature-based detection with machine learning. However, these components are typically developed and combined in isolation, missing opportunities to reduce data complexity and strengthen defenses against adversarial EXEmples, carefully crafted programs designed to evade detection. Hence, in this work we investigate the influence that signature-based detection exerts on model training, when they are included inside the training pipeline. Specifically, we compare models trained on a comprehensive dataset with an AI system whose machine learning component is trained solely on samples not already flagged by signatures. Our results demonstrate improved robustness to both adversarial EXEmples and temporal data drift, although this comes at the cost of a fixed lower bound on false positives, driven by suboptimal rule…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
