Adaptive Malware Detection using Sequential Feature Selection: A Dueling Double Deep Q-Network (D3QN) Framework for Intelligent Classification
Naseem Khan, Aref Y. Al-Tamimi, Amine Bermak, Issa M. Khalil

TL;DR
This paper introduces a reinforcement learning framework using D3QN for adaptive sequential feature selection in malware detection, significantly reducing computational costs while maintaining high accuracy.
Contribution
It presents a novel D3QN-based approach that dynamically selects features for malware classification, outperforming static methods in efficiency and accuracy.
Findings
Achieves over 98% accuracy with 96.6% feature reduction.
Reduces computational cost by up to 42.5x compared to traditional methods.
Learns structured, non-random feature selection policies.
Abstract
Traditional malware detection methods exhibit computational inefficiency due to exhaustive feature extraction requirements, creating accuracy-efficiency trade-offs that limit real-time deployment. We formulate malware classification as a Markov Decision Process with episodic feature acquisition and propose a Dueling Double Deep Q-Network (D3QN) framework for adaptive sequential feature selection. The agent learns to dynamically select informative features per sample before terminating with classification decisions, optimizing both detection accuracy and computational cost through reinforcement learning. We evaluate our approach on Microsoft Big2015 (9-class, 1,795 features) and BODMAS (binary, 2,381 features) datasets. D3QN achieves 99.22% and 98.83% accuracy while utilizing only 61 and 56 features on average, representing 96.6% and 97.6% dimensionality reduction. This yields…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
