Robust selection of predictors and conditional outlier detection in a perturbed large-dimensional regression context
Matteo Farn\`e, Angelos Vouldis

TL;DR
This paper introduces ROBOUT, a fast and versatile method for detecting outliers in large-dimensional regression datasets, especially effective under various data perturbations and idiosyncratic features.
Contribution
The paper proposes ROBOUT, a novel robust outlier detection method that outperforms existing techniques in complex, perturbed large-scale datasets by combining predictor selection, robust regression, and outlier identification.
Findings
ROBOUT effectively detects outliers with diverse data idiosyncrasies.
It outperforms existing methods like SPARSE-LTS and RLARS in simulations.
Application to banking data demonstrates practical utility.
Abstract
This paper presents a fast methodology, called ROBOUT, to identify outliers in a response variable conditional on a set of linearly related predictors, retrieved from a large granular dataset. ROBOUT is shown to be effective and particularly versatile compared to existing methods in the presence of a number of data idiosyncratic features. ROBOUT is able to identify observations with outlying conditional variance when the dataset contains element-wise sparse variables, and the set of predictors contains multivariate outliers. Existing integrated methodologies like SPARSE-LTS and RLARS are systematically sub-optimal under those conditions. ROBOUT entails a robust selection stage of the statistically relevant predictors (by using a Huber or a quantile loss), the estimation of a robust regression model based on the selected predictors (by LTS, GS or MM), and a criterion to identify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Advanced Statistical Process Monitoring · Statistical Methods and Inference
