NGD converges to less degenerate solutions than SGD

Moosa Saghir; N. R. Raghavendra; Zihe Liu; Evan Ryan Gunter

arXiv:2409.04913·cs.LG·September 16, 2024

NGD converges to less degenerate solutions than SGD

Moosa Saghir, N. R. Raghavendra, Zihe Liu, Evan Ryan Gunter

PDF

Open Access 1 Repo

TL;DR

This paper compares the effective dimension of models trained with natural gradient descent (NGD) and stochastic gradient descent (SGD), finding NGD models have higher effective dimension, indicating less degenerate solutions.

Contribution

It introduces a comparison of effective dimension measures, including the learning coefficient, between NGD and SGD trained models, highlighting differences in solution degeneracy.

Findings

01

NGD-trained models have higher effective dimension than SGD-trained models.

02

Higher effective dimension suggests NGD finds less degenerate solutions.

03

Results are consistent across different measures of effective dimension.

Abstract

The number of free parameters, or dimension, of a model is a straightforward way to measure its complexity: a model with more parameters can encode more information. However, this is not an accurate measure of complexity: models capable of memorizing their training data often generalize well despite their high dimension. Effective dimension aims to more directly capture the complexity of a model by counting only the number of parameters required to represent the functionality of the model. Singular learning theory (SLT) proposes the learning coefficient $λ$ as a more accurate measure of effective dimension. By describing the rate of increase of the volume of the region of parameter space around a local minimum with respect to loss, $λ$ incorporates information from higher-order terms. We compare $λ$ of models trained using natural gradient descent (NGD) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cxtraa/ngd_with_slt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDiet and metabolism studies

MethodsNatural Gradient Descent