Towards Practical Robustness Auditing for Linear Regression
Daniel Freund, Samuel B. Hopkins

TL;DR
This paper explores practical algorithms for identifying small influential data subsets that can alter linear regression coefficients, demonstrating improved performance but facing computational challenges, especially in higher dimensions.
Contribution
It introduces and evaluates algorithms for robustness auditing in linear regression, including a spectral method inspired by robust statistics, highlighting current limitations and future directions.
Findings
Mixed integer optimization outperforms existing methods in low dimensions.
Spectral algorithms show promise for higher-dimensional problems.
Computational bottlenecks remain significant for dimensions three and above.
Abstract
We investigate practical algorithms to find or disprove the existence of small subsets of a dataset which, when removed, reverse the sign of a coefficient in an ordinary least squares regression involving that dataset. We empirically study the performance of well-established algorithmic techniques for this task -- mixed integer quadratically constrained optimization for general linear regression problems and exact greedy methods for special cases. We show that these methods largely outperform the state of the art and provide a useful robustness check for regression problems in a few dimensions. However, significant computational bottlenecks remain, especially for the important task of disproving the existence of such small sets of influential samples for regression problems of dimension or greater. We make some headway on this challenge via a spectral algorithm using ideas drawn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Statistical Methods and Inference · Advanced Statistical Methods and Models
MethodsLinear Regression
