Support Vectors and Gradient Dynamics of Single-Neuron ReLU Networks
Sangmin Lee, Byeongsu Sim, Jong Chul Ye

TL;DR
This paper investigates the training dynamics of single-neuron ReLU networks, revealing an implicit support vector bias that explains their generalization, and proves convergence in a specific low-dimensional case.
Contribution
It introduces a support vector-based implicit bias in single-neuron ReLU networks and analyzes the impact of initialization norm and convergence properties.
Findings
Support vector bias explains generalization in ReLU networks
Norm of weights increases during training
Global convergence proved for 2D case
Abstract
Understanding implicit bias of gradient descent for generalization capability of ReLU networks has been an important research topic in machine learning research. Unfortunately, even for a single ReLU neuron trained with the square loss, it was recently shown impossible to characterize the implicit regularization in terms of a norm of model parameters (Vardi & Shamir, 2021). In order to close the gap toward understanding intriguing generalization behavior of ReLU networks, here we examine the gradient flow dynamics in the parameter space when training single-neuron ReLU networks. Specifically, we discover an implicit bias in terms of support vectors, which plays a key role in why and how ReLU networks generalize well. Moreover, we analyze gradient flows with respect to the magnitude of the norm of initialization, and show that the norm of the learned weight strictly increases through the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and ELM · Model Reduction and Neural Networks
