On Margin Maximization in Linear and ReLU Networks

Gal Vardi; Ohad Shamir; Nathan Srebro

arXiv:2110.02732·cs.LG·October 4, 2022·1 cites

On Margin Maximization in Linear and ReLU Networks

Gal Vardi, Ohad Shamir, Nathan Srebro

PDF

Open Access

TL;DR

This paper investigates the implicit bias of linear and ReLU neural networks trained with gradient flow, revealing that KKT points are often not local optima but identifying conditions where optima can be guaranteed.

Contribution

It provides a detailed analysis of margin maximization in neural networks, showing when KKT points are or are not local optima of the max margin problem.

Findings

01

KKT points are often not local optima in many neural network settings.

02

Certain architectures and conditions guarantee local or global optima.

03

The study clarifies the relationship between gradient flow solutions and max margin solutions.

Abstract

The implicit bias of neural networks has been extensively studied in recent years. Lyu and Li [2019] showed that in homogeneous networks trained with the exponential or the logistic loss, gradient flow converges to a KKT point of the max margin problem in the parameter space. However, that leaves open the question of whether this point will generally be an actual optimum of the max margin problem. In this paper, we study this question in detail, for several neural network architectures involving linear and ReLU activations. Perhaps surprisingly, we show that in many cases, the KKT point is not even a local optimum of the max margin problem. On the flip side, we identify multiple settings where a local or global optimum can be guaranteed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning

MethodsFLIP