Support Vectors and Gradient Dynamics of Single-Neuron ReLU Networks

Sangmin Lee; Byeongsu Sim; Jong Chul Ye

arXiv:2202.05510·cs.LG·June 14, 2022

Support Vectors and Gradient Dynamics of Single-Neuron ReLU Networks

Sangmin Lee, Byeongsu Sim, Jong Chul Ye

PDF

Open Access

TL;DR

This paper investigates the training dynamics of single-neuron ReLU networks, revealing an implicit support vector bias that explains their generalization, and proves convergence in a specific low-dimensional case.

Contribution

It introduces a support vector-based implicit bias in single-neuron ReLU networks and analyzes the impact of initialization norm and convergence properties.

Findings

01

Support vector bias explains generalization in ReLU networks

02

Norm of weights increases during training

03

Global convergence proved for 2D case

Abstract

Understanding implicit bias of gradient descent for generalization capability of ReLU networks has been an important research topic in machine learning research. Unfortunately, even for a single ReLU neuron trained with the square loss, it was recently shown impossible to characterize the implicit regularization in terms of a norm of model parameters (Vardi & Shamir, 2021). In order to close the gap toward understanding intriguing generalization behavior of ReLU networks, here we examine the gradient flow dynamics in the parameter space when training single-neuron ReLU networks. Specifically, we discover an implicit bias in terms of support vectors, which plays a key role in why and how ReLU networks generalize well. Moreover, we analyze gradient flows with respect to the magnitude of the norm of initialization, and show that the norm of the learned weight strictly increases through the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Machine Learning and ELM · Model Reduction and Neural Networks