Distributed Feature Screening via Componentwise Debiasing

Xingxiang Li; Runze Li; Zhiming Xia; Chen Xu

arXiv:1903.03810·stat.ME·March 12, 2019·J. Mach. Learn. Res.·27 cites

Distributed Feature Screening via Componentwise Debiasing

Xingxiang Li, Runze Li, Zhiming Xia, Chen Xu

PDF

Open Access

TL;DR

This paper introduces a distributed feature screening method for high-dimensional data that leverages componentwise debiasing and U-statistics, enabling efficient, accurate, and scalable feature selection in big data contexts.

Contribution

It proposes a novel distributed screening framework that uses componentwise debiasing and U-statistics, achieving high accuracy and computational efficiency in large-scale data analysis.

Findings

01

Achieves screening accuracy comparable to centralized methods.

02

Enables scalable parallel computation for large datasets.

03

Demonstrates effectiveness through extensive numerical experiments.

Abstract

Feature screening is a powerful tool in the analysis of high dimensional data. When the sample size $N$ and the number of features $p$ are both large, the implementation of classic screening methods can be numerically challenging. In this paper, we propose a distributed screening framework for big data setup. In the spirit of "divide-and-conquer", the proposed framework expresses a correlation measure as a function of several component parameters, each of which can be distributively estimated using a natural U-statistic from data segments. With the component estimates aggregated, we obtain a final correlation estimate that can be readily used for screening features. This framework enables distributed storage and parallel computing and thus is computationally attractive. Due to the unbiased distributive estimation of the component parameters, the final aggregated estimate achieves a high…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Machine Learning and Data Classification · Statistical Methods and Inference