Random Partitioning and Distribution-based Thresholding for Iterative Variable Screening in High Dimensions
Yu-Hsiang Cheng, Tzee-Ming Huang, Su-Yun Huang

TL;DR
This paper introduces a new iterative variable screening method combining random partitioning and distribution-based thresholding, capable of handling high-dimensional data with millions of predictors, and demonstrating superior performance in simulations and real data.
Contribution
It proposes a novel two-stage iterative screening algorithm that efficiently manages ultra-high dimensional data using random partitioning and thresholding rules.
Findings
Outperforms existing variable screening methods in simulations.
Effectively handles predictors in the millions.
Demonstrates strong performance on real data applications.
Abstract
In big data analysis, a simple task such as linear regression can become very challenging as the variable dimension grows. As a result, variable screening is inevitable in many scientific studies. In recent years, randomized algorithms have become a new trend and are playing an increasingly important role for large scale data analysis. In this article, we combine the ideas of variable screening and random partitioning to propose a new iterative variable screening method. For moderate sized of order , we propose a basic algorithm that adopts a distribution-based thresholding rule. For very large , we further propose a two-stage procedure. This two-stage procedure first performs a random partitioning to divide predictors into subsets of manageable size of order for variable screening, where can be an arbitrarily small positive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · Machine Learning and Algorithms
