Detection of Multiple Influential Observations on Model Selection
Dongliang Zhang, Masoud Asgharian, Martin A. Lindquist

TL;DR
This paper develops a new statistical framework for detecting influential outliers in high-dimensional models, including logistic regression, with applications to fMRI data, improving model robustness and reproducibility.
Contribution
It introduces a theoretically grounded approach for identifying influential observations affecting model selection in high-dimensional settings, extending existing diagnostics.
Findings
New asymptotic distribution derived for the diagnostic measure
Effective detection of influential outliers in linear and logistic models
Application to fMRI data reveals previously undetected influential observations
Abstract
Outlying observations are frequently encountered across a wide spectrum of scientific domains, posing notable challenges to the generalizability of statistical models and the reproducibility of downstream analysis. They are identified through influential diagnostics, which aim to capture observations that unduly bias model estimation. To date, methods for identifying observations that influence the selection of a stochastically chosen submodel have been underdeveloped, especially in the high-dimensional setting where the number of predictors exceeds the sample size . Recently we proposed an improved diagnostic measure to handle this setting. However, its distributional properties and approximations have not yet been explored. To address this shortcoming, we revisit the notion of exchangeability to determine the exact asymptotic distribution of our assessment measure. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems
