TL;DR
This paper establishes generalization bounds for neural networks by leveraging their observed low-rank structure, showing that networks with low-rank weights can generalize well with sample complexity depending on the rank.
Contribution
It provides the first theoretical generalization bounds that incorporate the low-rank structure of neural network weights, connecting rank regularization to sample complexity.
Findings
Bounds depend on Schatten p quasi norms of weights
Sample complexity scales with network width, depth, and rank
Low-rank regularization improves generalization guarantees
Abstract
It has been recently observed in much of the literature that neural networks exhibit a bottleneck rank property: for larger depths, the activation and weights of neural networks trained with gradient-based methods tend to be of approximately low rank. In fact, the rank of the activations of each layer converges to a fixed value referred to as the ``bottleneck rank'', which is the minimum rank required to represent the training data. This perspective is in line with the observation that regularizing linear networks (without activations) with weight decay is equivalent to minimizing the Schatten quasi norm of the neural network. In this paper we investigate the implications of this phenomenon for generalization. More specifically, we prove generalization bounds for neural networks which exploit the approximate low rank structure of the weight matrices if present. The final results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
