A sub-sampling algorithm preventing outliers
L. Deldossi, E. Pesce, C. Tommasi

TL;DR
This paper introduces a novel subsampling algorithm that avoids outliers and high leverage points, improving the robustness of linear model estimation from large datasets by using D- and I-optimality criteria.
Contribution
It proposes both unsupervised and supervised exchange procedures for subsampling that prevent outliers and leverage points, enhancing model accuracy and robustness.
Findings
The algorithms effectively reduce the influence of outliers and leverage points.
The methods improve the accuracy of linear models on large datasets.
Both procedures are generalized to I-optimality for better prediction accuracy.
Abstract
Nowadays, in many different fields, massive data are available and for several reasons, it might be convenient to analyze just a subset of the data. The application of the D-optimality criterion can be helpful to optimally select a subsample of observations. However, it is well known that D-optimal support points lie on the boundary of the design space and if they go hand in hand with extreme response values, they can have a severe influence on the estimated linear model (leverage points with high influence). To overcome this problem, firstly, we propose an unsupervised exchange procedure that enables us to select a nearly D-optimal subset of observations without high leverage values. Then, we provide a supervised version of this exchange procedure, where besides high leverage points also the outliers in the responses (that are not associated to high leverage points) are avoided. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimal Experimental Design Methods · Probabilistic and Robust Engineering Design · Advanced Multi-Objective Optimization Algorithms
