Influence of single observations on the choice of the penalty parameter in ridge regression
Kristoffer H. Hellton, Camilla Lingj{\ae}rde, Riccardo De Bin

TL;DR
This paper examines how individual data points influence the selection of the penalty parameter in ridge regression, introducing a visual tool to identify influential points, especially useful in high-dimensional settings.
Contribution
It introduces a novel visual exploratory method to assess the impact of single observations on penalty parameter choice in ridge regression, applicable to high-dimensional data.
Findings
Influential points can significantly alter the optimal penalty parameter.
The proposed tool effectively identifies influential observations in both simulated and real data.
Application demonstrated in low- and high-dimensional contexts.
Abstract
Penalized regression methods such as ridge regression heavily rely on the choice of a tuning or penalty parameter, which is often computed via cross-validation. Discrepancies in the value of the penalty parameter may lead to substantial differences in regression coefficient estimates and predictions. In this paper, we investigate the effect of single observations on the optimal choice of the tuning parameter, showing how the presence of influential points can change it dramatically. We distinguish between points as ``expanders'' and ``shrinkers'', based on their effect on the model complexity. Our approach supplies a visual exploratory tool to identify influential points, naturally implementable for high-dimensional data where traditional approaches usually fail. Applications to simulated and real data examples, both low- and high-dimensional, are presented. The visual tool is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Data Analysis with R · Forest ecology and management
