Cross-Validated Loss-Based Covariance Matrix Estimator Selection in High   Dimensions

Philippe Boileau; Nima S. Hejazi; Mark J. van der Laan; Sandrine; Dudoit

arXiv:2102.09715·stat.ME·November 12, 2024·J. Comput. Graph. Stat.

Cross-Validated Loss-Based Covariance Matrix Estimator Selection in High Dimensions

Philippe Boileau, Nima S. Hejazi, Mark J. van der Laan, Sandrine, Dudoit

PDF

Open Access 2 Repos

TL;DR

This paper introduces a cross-validated loss-based method for selecting optimal covariance matrix estimators in high-dimensional settings, providing theoretical guarantees and practical tools for improved statistical analysis.

Contribution

It develops a general framework for estimator selection using cross-validation, with finite-sample risk bounds and asymptotic optimality results.

Findings

01

The proposed method outperforms traditional estimators in simulations.

02

Application to transcriptome data demonstrates practical effectiveness.

03

Provides an open-source R package for implementation.

Abstract

The covariance matrix plays a fundamental role in many modern exploratory and inferential statistical procedures, including dimensionality reduction, hypothesis testing, and regression. In low-dimensional regimes, where the number of observations far exceeds the number of variables, the optimality of the sample covariance matrix as an estimator of this parameter is well-established. High-dimensional regimes do not admit such a convenience, however. As such, a variety of estimators have been derived to overcome the shortcomings of the sample covariance matrix in these settings. Yet, the question of selecting an optimal estimator from among the plethora available remains largely unaddressed. Using the framework of cross-validated loss-based estimation, we develop the theoretical underpinnings of just such an estimator selection procedure. In particular, we propose a general class of loss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSingle-cell and spatial transcriptomics · Gene expression and cancer classification · Statistical Methods and Inference