Comparing Classes of Estimators: When does Gradient Descent Beat Ridge Regression in Linear Models?
Dominic Richards, Edgar Dobriban, Patrick Rebeschini

TL;DR
This paper compares the performance of gradient descent and ridge regression in linear models, revealing how their relative effectiveness depends on data covariance eigenvalue decay and tuning parameter choices.
Contribution
It provides a detailed analysis of when gradient descent outperforms ridge regression based on eigenvalue decay rates and computes the optimal estimator class for orthogonal designs.
Findings
Gradient descent outperforms ridge regression with slow eigenvalue decay.
Ridge regression outperforms gradient descent with fast eigenvalue decay.
Optimal estimator class for orthogonal designs is equivalent to gradient descent with decaying learning rate.
Abstract
Methods for learning from data depend on various types of tuning parameters, such as penalization strength or step size. Since performance can depend strongly on these parameters, it is important to compare classes of estimators-by considering prescribed finite sets of tuning parameters-not just particularly tuned methods. In this work, we investigate classes of methods via the relative performance of the best method in the class. We consider the central problem of linear regression-with a random isotropic ground truth-and investigate the estimation performance of two fundamental methods, gradient descent and ridge regression. We unveil the following phenomena. (1) For general designs, constant stepsize gradient descent outperforms ridge regression when the eigenvalues of the empirical data covariance matrix decay slowly, as a power law with exponent less than unity. If instead the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms
