# Random subspace-based ensemble classifier for high-dimensional data Using SPARK

**Authors:** Venkaiah Chowdary Bhimineni, Rajiv Senapati, Razieh Sheikhpour, Razieh Sheikhpour, Razieh Sheikhpour

PMC · DOI: 10.1371/journal.pone.0342408 · 2026-03-11

## TL;DR

This paper introduces a new ensemble classifier for high-dimensional data using Spark, which improves accuracy and scalability.

## Contribution

A novel ensemble classifier (ISSBEC) is proposed with feature fusion and improved clustering for high-dimensional data.

## Key findings

- ISSBEC improves accuracy and robustness in high-dimensional classification.
- The framework uses IDFC and SVM-MRFE for efficient data partitioning and feature selection.
- Experimental results show ISSBEC outperforms existing methods in performance metrics.

## Abstract

High-dimensional data classification remains challenging for machine learning models due to sparsity and overfitting caused by the ‘curse of dimensionality‘. As the number of features increases, data points become sparse, hindering generalization in classification and leading to higher computational costs and reduced accuracy. To address these issues, we propose an ensemble classifier based on random subspaces implemented in the Spark framework. The proposed framework comprises three key stages. First, the high-dimensional data is normalised through min-max normalisation. Second, the master node partitions the data by using improved deep fuzzy clustering (IDFC). In contrast, the slave node applies support vector machine-modified recursive feature elimination (SVM-MRFE) for efficient feature selection, followed by feature fusion. Finally, we introduced an improved subspace-based ensemble classifier (ISSBEC) that comprises a feature-fusion-based random subspace (FF-RSS), mixed-space enhancement (MSE), and multiple base classifiers. The efficacy of the ISSBEC classifier was evaluated using a set of performance metrics and compared with state-of-the-art methods. Experimental results demonstrate that the proposed approach improves both accuracy and robustness, offering a scalable solution to the limitations of high-dimensional datasets.

## Full-text entities

- **Diseases:** ISSBEC (MESH:D019292), Cancer (MESH:D009369), HD (MESH:D006816), ML (MESH:D007859)
- **Chemicals:** PONE-D-25-47457R1 (-)

## Figures

50 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12978757/full.md

---
Source: https://tomesphere.com/paper/PMC12978757