Distributed Feature Screening via Componentwise Debiasing
Xingxiang Li, Runze Li, Zhiming Xia, Chen Xu

TL;DR
This paper introduces a distributed feature screening method for high-dimensional data that leverages componentwise debiasing and U-statistics, enabling efficient, accurate, and scalable feature selection in big data contexts.
Contribution
It proposes a novel distributed screening framework that uses componentwise debiasing and U-statistics, achieving high accuracy and computational efficiency in large-scale data analysis.
Findings
Achieves screening accuracy comparable to centralized methods.
Enables scalable parallel computation for large datasets.
Demonstrates effectiveness through extensive numerical experiments.
Abstract
Feature screening is a powerful tool in the analysis of high dimensional data. When the sample size and the number of features are both large, the implementation of classic screening methods can be numerically challenging. In this paper, we propose a distributed screening framework for big data setup. In the spirit of "divide-and-conquer", the proposed framework expresses a correlation measure as a function of several component parameters, each of which can be distributively estimated using a natural U-statistic from data segments. With the component estimates aggregated, we obtain a final correlation estimate that can be readily used for screening features. This framework enables distributed storage and parallel computing and thus is computationally attractive. Due to the unbiased distributive estimation of the component parameters, the final aggregated estimate achieves a high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Machine Learning and Data Classification · Statistical Methods and Inference
