TL;DR
The paper introduces FSCRE, a fast, scalable ensemble method designed for high-dimensional data with cellwise contamination, improving variable selection and prediction accuracy.
Contribution
It develops a novel multi-stage framework that combines robust data cleaning, covariance estimation, and ensemble variable selection, filling a gap in handling cellwise contamination.
Findings
FSCRE outperforms existing methods in variable selection accuracy.
The method demonstrates superior predictive performance in contaminated high-dimensional data.
Theoretical guarantees include invariance properties and local selection stability.
Abstract
The analysis of high-dimensional data, common in fields such as genomics, is complicated by the presence of cellwise contamination, where individual cells rather than entire rows are corrupted. This contamination poses a significant challenge to standard variable selection techniques. While recent ensemble methods have introduced deterministic frameworks that partition the predictor space to manage high collinearity, these architectures were not designed to handle cellwise contamination, leaving a critical methodological gap. To bridge this gap, we propose the Fast and Scalable Cellwise-Robust Ensemble (FSCRE) algorithm, a multi-stage framework integrating three key statistical stages. First, the algorithm establishes a robust foundation by deriving a cleaned data matrix and a reliable, cellwise-robust covariance structure. Variable selection then proceeds via a competitive ensemble: a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
