Scalable Feature Subset Selection for Big Data using Parallel Hybrid Evolutionary Algorithm based Wrapper in Apache Spark
Yelleti Vivek, Vadlamani Ravi, Pisipati Radhakrishna

TL;DR
This paper introduces scalable parallel hybrid evolutionary algorithms in Apache Spark for feature subset selection on large datasets, improving search efficiency and avoiding premature convergence.
Contribution
It proposes novel hybrid parallel EAs combining DE and TA under Spark, demonstrating significant improvements over baseline methods in large-scale feature selection.
Findings
PB-TADE outperforms PB-DE and PB-DETA in statistical significance.
Algorithms achieve higher AUC and reduced feature subset size.
Significant speedup observed in large datasets.
Abstract
Owing to the emergence of large datasets, applying current sequential wrapper-based feature subset selection (FSS) algorithms increases the complexity. This limitation motivated us to propose a wrapper for feature subset selection (FSS) based on parallel and distributed hybrid evolutionary algorithms (EAs) under the Apache Spark environment. The hybrid EAs are based on the BDE and Binary Threshold Accepting (BTA), a point-based EA, which is invoked to enhance the search capability and avoid premature convergence of the PB-DE. Thus, we designed the hybrid variants (i) parallel binary differential evolution and threshold accepting (PB-DETA), where DE and TA work in tandem in every iteration, and (ii) parallel binary threshold accepting and differential evolution (PB-TADE), where TA and DE work in tandem in every iteration under the Apache Spark environment. Both PB-DETA and PB-TADE are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetaheuristic Optimization Algorithms Research · Evolutionary Algorithms and Applications · Face and Expression Recognition
MethodsLogistic Regression
