Scalable mRMR feature selection to handle high dimensional datasets:   Vertical partitioning based Iterative MapReduce framework

Yelleti Vivek; P.S.V.S. Sai Prasad

arXiv:2208.09901·cs.DC·July 25, 2024

Scalable mRMR feature selection to handle high dimensional datasets: Vertical partitioning based Iterative MapReduce framework

Yelleti Vivek, P.S.V.S. Sai Prasad

PDF

Open Access

TL;DR

This paper introduces VMR_mRMR, a scalable vertical partitioning MapReduce framework for feature selection that outperforms existing methods in handling high-dimensional datasets efficiently.

Contribution

It presents a novel vertical partitioning-based MapReduce approach with memorization to improve scalability and performance in mRMR feature selection.

Findings

01

VMR_mRMR significantly outperforms existing approaches.

02

Achieves better computational gain (C.G).

03

Effective in high-dimensional datasets.

Abstract

While building machine learning models, Feature selection (FS) stands out as an essential preprocessing step used to handle the uncertainty and vagueness in the data. Recently, the minimum Redundancy and Maximum Relevance (mRMR) approach has proven to be effective in obtaining the irredundant feature subset. Owing to the generation of voluminous datasets, it is essential to design scalable solutions using distributed/parallel paradigms. MapReduce solutions are proven to be one of the best approaches to designing fault-tolerant and scalable solutions. This work analyses the existing MapReduce approaches for mRMR feature selection and identifies the limitations thereof. In the current study, we proposed VMR_mRMR, an efficient vertical partitioning-based approach using a memorization approach, thereby overcoming the extant approaches limitations. The experiment analysis says that VMR_mRMR…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Face and Expression Recognition · Data Mining Algorithms and Applications