Gradient Descent Maximizes the Margin of Homogeneous Neural Networks

Kaifeng Lyu; Jian Li

arXiv:1906.05890·cs.LG·January 1, 2021·57 cites

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks

Kaifeng Lyu, Jian Li

PDF

Open Access 1 Repo

TL;DR

This paper investigates how gradient descent implicitly maximizes the margin in homogeneous neural networks, providing theoretical insights and empirical validation on standard datasets, with implications for model robustness.

Contribution

It generalizes previous margin maximization results to broader classes of neural networks and offers quantitative convergence analysis under weaker assumptions.

Findings

01

Normalized margin increases over training time.

02

Convergence to a KKT point of a margin-related optimization problem.

03

Empirical validation on MNIST and CIFAR-10 datasets.

Abstract

In this paper, we study the implicit regularization of the gradient descent algorithm in homogeneous neural networks, including fully-connected and convolutional neural networks with ReLU or LeakyReLU activations. In particular, we study the gradient descent or gradient flow (i.e., gradient descent with infinitesimal step size) optimizing the logistic loss or cross-entropy loss of any homogeneous model (possibly non-smooth), and show that if the training loss decreases below a certain threshold, then we can define a smoothed version of the normalized margin which increases over time. We also formulate a natural constrained optimization problem related to margin maximization, and prove that both the normalized margin and its smoothed version converge to the objective value at a KKT point of the optimization problem. Our results generalize the previous results for logistic regression with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vfleaking/max-margin
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsLogistic Regression · *Communicated@Fast*How Do I Communicate to Expedia?