NGD converges to less degenerate solutions than SGD
Moosa Saghir, N. R. Raghavendra, Zihe Liu, Evan Ryan Gunter

TL;DR
This paper compares the effective dimension of models trained with natural gradient descent (NGD) and stochastic gradient descent (SGD), finding NGD models have higher effective dimension, indicating less degenerate solutions.
Contribution
It introduces a comparison of effective dimension measures, including the learning coefficient, between NGD and SGD trained models, highlighting differences in solution degeneracy.
Findings
NGD-trained models have higher effective dimension than SGD-trained models.
Higher effective dimension suggests NGD finds less degenerate solutions.
Results are consistent across different measures of effective dimension.
Abstract
The number of free parameters, or dimension, of a model is a straightforward way to measure its complexity: a model with more parameters can encode more information. However, this is not an accurate measure of complexity: models capable of memorizing their training data often generalize well despite their high dimension. Effective dimension aims to more directly capture the complexity of a model by counting only the number of parameters required to represent the functionality of the model. Singular learning theory (SLT) proposes the learning coefficient as a more accurate measure of effective dimension. By describing the rate of increase of the volume of the region of parameter space around a local minimum with respect to loss, incorporates information from higher-order terms. We compare of models trained using natural gradient descent (NGD) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDiet and metabolism studies
MethodsNatural Gradient Descent
