Fast and Scalable Cellwise-Robust Ensembles for High-Dimensional Data

Anthony Christidis; Jeyshinee Pyneeandee; Gabriela Cohen-Freue

arXiv:2603.20940·stat.ME·March 31, 2026

Fast and Scalable Cellwise-Robust Ensembles for High-Dimensional Data

Anthony Christidis, Jeyshinee Pyneeandee, Gabriela Cohen-Freue

PDF

1 Repo

TL;DR

The paper introduces FSCRE, a fast, scalable ensemble method designed for high-dimensional data with cellwise contamination, improving variable selection and prediction accuracy.

Contribution

It develops a novel multi-stage framework that combines robust data cleaning, covariance estimation, and ensemble variable selection, filling a gap in handling cellwise contamination.

Findings

01

FSCRE outperforms existing methods in variable selection accuracy.

02

The method demonstrates superior predictive performance in contaminated high-dimensional data.

03

Theoretical guarantees include invariance properties and local selection stability.

Abstract

The analysis of high-dimensional data, common in fields such as genomics, is complicated by the presence of cellwise contamination, where individual cells rather than entire rows are corrupted. This contamination poses a significant challenge to standard variable selection techniques. While recent ensemble methods have introduced deterministic frameworks that partition the predictor space to manage high collinearity, these architectures were not designed to handle cellwise contamination, leaving a critical methodological gap. To bridge this gap, we propose the Fast and Scalable Cellwise-Robust Ensemble (FSCRE) algorithm, a multi-stage framework integrating three key statistical stages. First, the algorithm establishes a robust foundation by deriving a cleaned data matrix and a reliable, cellwise-robust covariance structure. Variable selection then proceeds via a competitive ensemble: a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.