Feature subset selection for Big Data via Chaotic Binary Differential Evolution under Apache Spark
Yelleti Vivek, Vadlamani Ravi, P. Radhakrishna

TL;DR
This paper introduces a scalable parallel chaotic binary differential evolution algorithm for feature subset selection in Big Data, optimizing for AUC and subset size, and demonstrates its superior performance over existing methods.
Contribution
It proposes a novel chaotic binary differential evolution method with a scalable island-based parallelization for high-dimensional feature selection.
Findings
The proposed P-CBDE-iS outperforms P-BDE-iS in solution quality.
The parallel approach achieves significant speedup.
The method effectively handles high-dimensional datasets.
Abstract
Feature subset selection (FSS) using a wrapper approach is essentially a combinatorial optimization problem having two objective functions namely cardinality of the selected-feature-subset, which should be minimized and the corresponding area under the ROC curve (AUC) to be maximized. In this research study, we propose a novel multiplicative single objective function involving cardinality and AUC. The randomness involved in the Binary Differential Evolution (BDE) may yield less diverse solutions thereby getting trapped in local minima. Hence, we embed Logistic and Tent chaotic maps into the BDE and named it as Chaotic Binary Differential Evolution (CBDE). Designing a scalable solution to the FSS is critical when dealing with high-dimensional and voluminous datasets. Hence, we propose a scalable island (iS) based parallelization approach where the data is divided into multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Metaheuristic Optimization Algorithms Research · Face and Expression Recognition
MethodsLogistic Regression
