Novel GPU Boruta algorithms for feature selection from high-dimensional data
Xurui Li, Zhiguo Gan, Jiaming Zhang, Zheng Liu, Diannan Lu

TL;DR
This paper introduces two GPU-accelerated versions of the Boruta feature selection algorithm, significantly enhancing efficiency for large high-dimensional datasets while maintaining accuracy.
Contribution
The study presents novel GPU-based implementations of Boruta, enabling faster feature selection on large datasets with comparable accuracy to the original method.
Findings
GPU algorithms greatly improve computational efficiency.
Impurity reduction based Boruta can overestimate feature importance.
GPU Boruta is effective and cost-efficient for large-scale data analysis.
Abstract
Most feature selection algorithms, especially wrapper methods, run inefficiently on CPU based platforms because of their high computational complexity. This inefficiency makes them unsuitable for processing large scale datasets. To address this challenge, the present study proposed two GPU accelerated versions of the Boruta feature selection procedure, in which Boruta-Permut relies on permutation based feature importance and Boruta-TreeImp employs importance based on impurity reduction. To evaluate these methods we conducted experiments on both a self constructed dataset and several publicly available datasets. The experimental results show that the proposed GPU accelerated algorithms greatly improve computational efficiency while preserving feature selection accuracy comparable to the original Boruta algorithm. In our analysis we also observe that the impurity reduction based version…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
