MH-FSF: A Unified Framework for Overcoming Benchmarking and Reproducibility Limitations in Feature Selection Evaluation

Vanderson Rocha; Diego Kreutz; Gabriel Canto; Hendrio Bragan\c{c}a; Eduardo Feitosa

arXiv:2507.10591·cs.LG·July 16, 2025

MH-FSF: A Unified Framework for Overcoming Benchmarking and Reproducibility Limitations in Feature Selection Evaluation

Vanderson Rocha, Diego Kreutz, Gabriel Canto, Hendrio Bragan\c{c}a, Eduardo Feitosa

PDF

Open Access

TL;DR

The paper introduces MH-FSF, a comprehensive platform for evaluating feature selection methods on Android malware datasets, addressing reproducibility issues and enabling systematic comparison of diverse techniques.

Contribution

It presents a modular, extensible framework with implementations of 17 feature selection methods and evaluation on 10 datasets, promoting methodological consistency in malware detection research.

Findings

01

Performance varies across datasets, emphasizing the importance of data preprocessing.

02

Unified platform facilitates comparison and reproducibility of feature selection methods.

03

Highlights the need for criteria that consider dataset imbalance.

Abstract

Feature selection is vital for building effective predictive models, as it reduces dimensionality and emphasizes key features. However, current research often suffers from limited benchmarking and reliance on proprietary datasets. This severely hinders reproducibility and can negatively impact overall performance. To address these limitations, we introduce the MH-FSF framework, a comprehensive, modular, and extensible platform designed to facilitate the reproduction and implementation of feature selection methods. Developed through collaborative research, MH-FSF provides implementations of 17 methods (11 classical, 6 domain-specific) and enables systematic evaluation on 10 publicly available Android malware datasets. Our results reveal performance variations across both balanced and imbalanced datasets, highlighting the critical need for data preprocessing and selection criteria that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFuzzy Logic and Control Systems · Machine Learning and Data Classification · Face and Expression Recognition