Demystifying the Role of Rule-based Detection in AI Systems for Windows Malware Detection

Andrea Ponte; Luca Demetrio; Luca Oneto; Ivan Tesfai Ogbu; Battista Biggio; Fabio Roli

arXiv:2508.09652·cs.CR·August 14, 2025

Demystifying the Role of Rule-based Detection in AI Systems for Windows Malware Detection

Andrea Ponte, Luca Demetrio, Luca Oneto, Ivan Tesfai Ogbu, Battista Biggio, Fabio Roli

PDF

TL;DR

This paper investigates how integrating signature-based detection with machine learning affects malware detection robustness, showing that combined training improves resistance to adversarial examples and data drift but introduces false positive limitations.

Contribution

It reveals the impact of signature-based detection on training AI models for malware detection and discusses how this integration enhances robustness against adversarial attacks.

Findings

01

Improved robustness to adversarial EXEmples

02

Enhanced resistance to temporal data drift

03

Trade-off with increased false positives

Abstract

Malware detection increasingly relies on AI systems that integrate signature-based detection with machine learning. However, these components are typically developed and combined in isolation, missing opportunities to reduce data complexity and strengthen defenses against adversarial EXEmples, carefully crafted programs designed to evade detection. Hence, in this work we investigate the influence that signature-based detection exerts on model training, when they are included inside the training pipeline. Specifically, we compare models trained on a comprehensive dataset with an AI system whose machine learning component is trained solely on samples not already flagged by signatures. Our results demonstrate improved robustness to both adversarial EXEmples and temporal data drift, although this comes at the cost of a fixed lower bound on false positives, driven by suboptimal rule…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.