CAFE-GB: Scalable and Stable Feature Selection for Malware Detection via Chunk-wise Aggregated Gradient Boosting
Ajvad Haneef K, Karan Kuwar Singh, Madhu Kumar S D

TL;DR
This paper introduces CAFE-GB, a scalable and robust feature selection framework for high-dimensional malware datasets that maintains detection performance while significantly reducing feature space.
Contribution
CAFE-GB is a novel chunk-wise gradient boosting-based feature selection method that ensures stability and scalability for large-scale malware detection.
Findings
Achieves over 95% feature reduction with no significant performance loss
Produces stable and interpretable feature rankings across datasets
Reduces computational overhead in malware classification pipelines
Abstract
High-dimensional malware datasets often exhibit feature redundancy, instability, and scalability limitations, which hinder the effectiveness and interpretability of machine learning-based malware detection systems. Although feature selection is commonly employed to mitigate these issues, many existing approaches lack robustness when applied to large-scale and heterogeneous malware data. To address this gap, this paper proposes CAFE-GB (Chunk-wise Aggregated Feature Estimation using Gradient Boosting), a scalable feature selection framework designed to produce stable and globally consistent feature rankings for high-dimensional malware detection. CAFE-GB partitions training data into overlapping chunks, estimates local feature importance using gradient boosting models, and aggregates these estimates to derive a robust global ranking. Feature budget selection is performed separately…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Anomaly Detection Techniques and Applications · Network Security and Intrusion Detection
