Detecting Concept Drift in Evolving Malware Families Using Rule-Based Classifier Representations

Tom\'a\v{s} Kaln\'y; Martin Jure\v{c}ek; Mark Stamp

arXiv:2604.22629·cs.CR·April 27, 2026

Detecting Concept Drift in Evolving Malware Families Using Rule-Based Classifier Representations

Tom\'a\v{s} Kaln\'y, Martin Jure\v{c}ek, Mark Stamp

PDF

TL;DR

This paper introduces a structural method using decision tree rulesets to detect concept drift in malware classification, correlating rule-based metrics with accuracy and data shifts.

Contribution

It presents a novel approach combining rule representations and multiple metrics to effectively detect concept drift in evolving malware families.

Findings

01

Fixed two-month windowing with Pearson correlation is most reliable.

02

Metrics are complementary, no single method dominates.

03

Approach correlates well with accuracy degradation and data distribution shifts.

Abstract

This work proposes a structural approach to concept drift detection in malware classification using decision tree rulesets. Classifiers are trained across temporal windows on the EMBER2024 dataset, and drift is quantified by comparing extracted rule representations using feature importance, prediction agreement, activation stability, and coverage metrics. These metrics are correlated with both accuracy degradation and data distribution shift as complementary drift indicators. The approach is evaluated across six malware families using fixed-interval and clustering-based windowing in family-vs-benign and family-vs-family settings, and compared against RIPPER and Transcendent baselines. Results show that fixed two-month windowing with feature-level Pearson correlation is the most reliable configuration, being the only one where all family pairs produce positive drift-accuracy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.