Detecting Concept Drift in Evolving Malware Families Using Rule-Based Classifier Representations
Tom\'a\v{s} Kaln\'y, Martin Jure\v{c}ek, Mark Stamp

TL;DR
This paper introduces a structural method using decision tree rulesets to detect concept drift in malware classification, correlating rule-based metrics with accuracy and data shifts.
Contribution
It presents a novel approach combining rule representations and multiple metrics to effectively detect concept drift in evolving malware families.
Findings
Fixed two-month windowing with Pearson correlation is most reliable.
Metrics are complementary, no single method dominates.
Approach correlates well with accuracy degradation and data distribution shifts.
Abstract
This work proposes a structural approach to concept drift detection in malware classification using decision tree rulesets. Classifiers are trained across temporal windows on the EMBER2024 dataset, and drift is quantified by comparing extracted rule representations using feature importance, prediction agreement, activation stability, and coverage metrics. These metrics are correlated with both accuracy degradation and data distribution shift as complementary drift indicators. The approach is evaluated across six malware families using fixed-interval and clustering-based windowing in family-vs-benign and family-vs-family settings, and compared against RIPPER and Transcendent baselines. Results show that fixed two-month windowing with feature-level Pearson correlation is the most reliable configuration, being the only one where all family pairs produce positive drift-accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
