Scalable mRMR feature selection to handle high dimensional datasets: Vertical partitioning based Iterative MapReduce framework
Yelleti Vivek, P.S.V.S. Sai Prasad

TL;DR
This paper introduces VMR_mRMR, a scalable vertical partitioning MapReduce framework for feature selection that outperforms existing methods in handling high-dimensional datasets efficiently.
Contribution
It presents a novel vertical partitioning-based MapReduce approach with memorization to improve scalability and performance in mRMR feature selection.
Findings
VMR_mRMR significantly outperforms existing approaches.
Achieves better computational gain (C.G).
Effective in high-dimensional datasets.
Abstract
While building machine learning models, Feature selection (FS) stands out as an essential preprocessing step used to handle the uncertainty and vagueness in the data. Recently, the minimum Redundancy and Maximum Relevance (mRMR) approach has proven to be effective in obtaining the irredundant feature subset. Owing to the generation of voluminous datasets, it is essential to design scalable solutions using distributed/parallel paradigms. MapReduce solutions are proven to be one of the best approaches to designing fault-tolerant and scalable solutions. This work analyses the existing MapReduce approaches for mRMR feature selection and identifies the limitations thereof. In the current study, we proposed VMR_mRMR, an efficient vertical partitioning-based approach using a memorization approach, thereby overcoming the extant approaches limitations. The experiment analysis says that VMR_mRMR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Face and Expression Recognition · Data Mining Algorithms and Applications
