Gradient Descent Quantizes ReLU Network Features
Hartmut Maennel, Olivier Bousquet, Sylvain Gelly

TL;DR
This paper analyzes why over-parametrized ReLU neural networks trained with gradient descent tend to concentrate weights in specific directions, leading to finitely many simple functions, which may explain their generalization properties.
Contribution
It uncovers a quantization effect in ReLU networks under small initialization and learning rate, linking network solutions to finitely many simple functions based on input data.
Findings
Weights tend to concentrate at a small number of directions.
Finitely many simple functions can be realized for given data.
Potential explanation for generalization in over-parametrized networks.
Abstract
Deep neural networks are often trained in the over-parametrized regime (i.e. with far more parameters than training examples), and understanding why the training converges to solutions that generalize remains an open problem. Several studies have highlighted the fact that the training procedure, i.e. mini-batch Stochastic Gradient Descent (SGD) leads to solutions that have specific properties in the loss landscape. However, even with plain Gradient Descent (GD) the solutions found in the over-parametrized regime are pretty good and this phenomenon is poorly understood. We propose an analysis of this behavior for feedforward networks with a ReLU activation function under the assumption of small initialization and learning rate and uncover a quantization effect: The weight vectors tend to concentrate at a small number of directions determined by the input data. As a consequence, we show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications
