On deletion diagnostic statistic in regression
Myung Geun Kim

TL;DR
This paper investigates the properties of deletion diagnostics in regression, revealing limitations of Cook's distance, and proposes a new scalar diagnostic measure based on distributional properties of the change in the least squares estimator.
Contribution
It introduces a new scalar diagnostic measure for influence in regression, addressing limitations of existing measures like Cook's distance and providing theoretical insights.
Findings
The normalized change in LSE is the square of the internally studentized residual.
Cook's distance numerator does not generally follow a chi-squared distribution.
A new influence diagnostic measure is proposed based on distributional properties.
Abstract
The change in the least squares estimator (LSE) of a vector of regression coefficients due to a case deletion is often used for investigating the influence of an observation on the LSE. A normalization of the change in the LSE using the Moore-Penrose inverse of the covariance matrix of the change in the LSE is derived. This normalization turns out to be a square of the internally studentized residual. It is shown that the numerator term of Cook's distance does not in general have a chi-squared distribution except for a single case. An elaborate explanation about the inappropriateness of the choice of a scaling matrix defining Cook's distance is given. By reflecting a distributional property of the change in the LSE due to a case deletion, a new diagnostic measure that is a scalar is suggested. Three numerical examples are given for illustration.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Research Methodologies and Applications · Grey System Theory Applications
