Positive region preserved random sampling: an efficient feature selection method for massive data
Hexiang Bai, Deyu Li, Jiye Liang, Yanhui Zhai

TL;DR
This paper introduces a novel feature selection method for massive data that preserves positive regions, enabling efficient and effective identification of relevant features with high discriminatory power on large datasets.
Contribution
It proposes a new sampling-based feature selection approach using positive region preservation, improving efficiency and discriminatory ability estimation for large-scale data.
Findings
High discriminatory ability of selected features
Fast computation on personal computers
Effective feature subset selection for massive data
Abstract
Selecting relevant features is an important and necessary step for intelligent machines to maximize their chances of success. However, intelligent machines generally have no enough computing resources when faced with huge volume of data. This paper develops a new method based on sampling techniques and rough set theory to address the challenge of feature selection for massive data. To this end, this paper proposes using the ratio of discernible object pairs to all object pairs that should be distinguished to measure the discriminatory ability of a feature set. Based on this measure, a new feature selection method is proposed. This method constructs positive region preserved samples from massive data to find a feature subset with high discriminatory ability. Compared with other methods, the proposed method has two advantages. First, it is able to select a feature subset that can preserve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
