Sparse least trimmed squares regression for analyzing high-dimensional large data sets
Andreas Alfons, Christophe Croux, Sarah Gelper

TL;DR
This paper introduces a robust and sparse regression method combining least trimmed squares with L1 penalty, effective for high-dimensional data with outliers, demonstrated on cancer gene expression data.
Contribution
It develops a novel sparse LTS estimator with a proven breakdown point and a fast algorithm, enhancing robustness and sparsity in high-dimensional regression.
Findings
Sparse LTS outperforms competitors in prediction accuracy with outliers.
The method effectively handles leverage points in high-dimensional data.
Application to cancer data demonstrates practical utility.
Abstract
Sparse model estimation is a topic of high importance in modern data analysis due to the increasing availability of data sets with a large number of variables. Another common problem in applied statistics is the presence of outliers in the data. This paper combines robust regression and sparse model estimation. A robust and sparse estimator is introduced by adding an penalty on the coefficient estimates to the well-known least trimmed squares (LTS) estimator. The breakdown point of this sparse LTS estimator is derived, and a fast algorithm for its computation is proposed. In addition, the sparse LTS is applied to protein and gene expression data of the NCI-60 cancer cell panel. Both a simulation study and the real data application show that the sparse LTS has better prediction performance than its competitors in the presence of leverage points.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Advanced Statistical Methods and Models · Statistical Methods and Inference
