Comparing Classes of Estimators: When does Gradient Descent Beat Ridge   Regression in Linear Models?

Dominic Richards; Edgar Dobriban; Patrick Rebeschini

arXiv:2108.11872·math.ST·June 14, 2022

Comparing Classes of Estimators: When does Gradient Descent Beat Ridge Regression in Linear Models?

Dominic Richards, Edgar Dobriban, Patrick Rebeschini

PDF

Open Access 1 Repo

TL;DR

This paper compares the performance of gradient descent and ridge regression in linear models, revealing how their relative effectiveness depends on data covariance eigenvalue decay and tuning parameter choices.

Contribution

It provides a detailed analysis of when gradient descent outperforms ridge regression based on eigenvalue decay rates and computes the optimal estimator class for orthogonal designs.

Findings

01

Gradient descent outperforms ridge regression with slow eigenvalue decay.

02

Ridge regression outperforms gradient descent with fast eigenvalue decay.

03

Optimal estimator class for orthogonal designs is equivalent to gradient descent with decaying learning rate.

Abstract

Methods for learning from data depend on various types of tuning parameters, such as penalization strength or step size. Since performance can depend strongly on these parameters, it is important to compare classes of estimators-by considering prescribed finite sets of tuning parameters-not just particularly tuned methods. In this work, we investigate classes of methods via the relative performance of the best method in the class. We consider the central problem of linear regression-with a random isotropic ground truth-and investigate the estimation performance of two fundamental methods, gradient descent and ridge regression. We unveil the following phenomena. (1) For general designs, constant stepsize gradient descent outperforms ridge regression when the eigenvalues of the empirical data covariance matrix decay slowly, as a power law with exponent less than unity. If instead the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dominicrichards/comparinggradientdescentridge
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms