CAFE-GB: Scalable and Stable Feature Selection for Malware Detection via Chunk-wise Aggregated Gradient Boosting

Ajvad Haneef K; Karan Kuwar Singh; Madhu Kumar S D

arXiv:2601.15754·cs.CR·January 23, 2026

CAFE-GB: Scalable and Stable Feature Selection for Malware Detection via Chunk-wise Aggregated Gradient Boosting

Ajvad Haneef K, Karan Kuwar Singh, Madhu Kumar S D

PDF

Open Access

TL;DR

This paper introduces CAFE-GB, a scalable and robust feature selection framework for high-dimensional malware datasets that maintains detection performance while significantly reducing feature space.

Contribution

CAFE-GB is a novel chunk-wise gradient boosting-based feature selection method that ensures stability and scalability for large-scale malware detection.

Findings

01

Achieves over 95% feature reduction with no significant performance loss

02

Produces stable and interpretable feature rankings across datasets

03

Reduces computational overhead in malware classification pipelines

Abstract

High-dimensional malware datasets often exhibit feature redundancy, instability, and scalability limitations, which hinder the effectiveness and interpretability of machine learning-based malware detection systems. Although feature selection is commonly employed to mitigate these issues, many existing approaches lack robustness when applied to large-scale and heterogeneous malware data. To address this gap, this paper proposes CAFE-GB (Chunk-wise Aggregated Feature Estimation using Gradient Boosting), a scalable feature selection framework designed to produce stable and globally consistent feature rankings for high-dimensional malware detection. CAFE-GB partitions training data into overlapping chunks, estimates local feature importance using gradient boosting models, and aggregates these estimates to derive a robust global ranking. Feature budget selection is performed separately…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Anomaly Detection Techniques and Applications · Network Security and Intrusion Detection