Inductive Bias of Gradient Descent for Weight Normalized Smooth   Homogeneous Neural Nets

Depen Morwani; Harish G. Ramaswamy

arXiv:2010.12909·cs.LG·February 2, 2023

Inductive Bias of Gradient Descent for Weight Normalized Smooth Homogeneous Neural Nets

Depen Morwani, Harish G. Ramaswamy

PDF

Open Access 1 Repo

TL;DR

This paper analyzes how gradient descent behaves with weight normalized smooth homogeneous neural networks, revealing differences between standard and exponential normalization, and showing EWN's tendency toward sparse solutions beneficial for pruning.

Contribution

It provides a theoretical analysis of the inductive bias of gradient descent with weight normalization, especially EWN, and establishes convergence rates and sparsity tendencies.

Findings

01

EWN gradient flow is equivalent to adaptive learning rate on standard networks

02

EWN promotes asymptotic relative sparsity in weights

03

Experimental results support sparse solutions with EWN even under SGD

Abstract

We analyze the inductive bias of gradient descent for weight normalized smooth homogeneous neural nets, when trained on exponential or cross-entropy loss. We analyse both standard weight normalization (SWN) and exponential weight normalization (EWN), and show that the gradient flow path with EWN is equivalent to gradient flow on standard networks with an adaptive learning rate. We extend these results to gradient descent, and establish asymptotic relations between weights and gradients for both SWN and EWN. We also show that EWN causes weights to be updated in a way that prefers asymptotic relative sparsity. For EWN, we provide a finite-time convergence rate of the loss with gradient flow and a tight asymptotic convergence rate with gradient descent. We demonstrate our results for SWN and EWN on synthetic data sets. Experimental results on simple datasets support our claim on sparse EWN…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DepenM/Exp-WN
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Neural Networks and Applications · Stochastic Gradient Optimization Techniques

MethodsWeight Normalization · Stochastic Gradient Descent