Positive region preserved random sampling: an efficient feature selection method for massive data

Hexiang Bai; Deyu Li; Jiye Liang; Yanhui Zhai

arXiv:2507.01998·cs.LG·July 4, 2025

Positive region preserved random sampling: an efficient feature selection method for massive data

Hexiang Bai, Deyu Li, Jiye Liang, Yanhui Zhai

PDF

TL;DR

This paper introduces a novel feature selection method for massive data that preserves positive regions, enabling efficient and effective identification of relevant features with high discriminatory power on large datasets.

Contribution

It proposes a new sampling-based feature selection approach using positive region preservation, improving efficiency and discriminatory ability estimation for large-scale data.

Findings

01

High discriminatory ability of selected features

02

Fast computation on personal computers

03

Effective feature subset selection for massive data

Abstract

Selecting relevant features is an important and necessary step for intelligent machines to maximize their chances of success. However, intelligent machines generally have no enough computing resources when faced with huge volume of data. This paper develops a new method based on sampling techniques and rough set theory to address the challenge of feature selection for massive data. To this end, this paper proposes using the ratio of discernible object pairs to all object pairs that should be distinguished to measure the discriminatory ability of a feature set. Based on this measure, a new feature selection method is proposed. This method constructs positive region preserved samples from massive data to find a feature subset with high discriminatory ability. Compared with other methods, the proposed method has two advantages. First, it is able to select a feature subset that can preserve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.