Norm-based generalisation bounds for multi-class convolutional neural networks
Antoine Ledent, Waleed Mustafa, Yunwen Lei, Marius Kloft

TL;DR
This paper derives new generalisation error bounds for deep convolutional neural networks that depend only logarithmically on the number of classes and incorporate weight sharing, improving theoretical understanding of CNNs.
Contribution
The authors develop class-number-independent bounds for CNNs using Rademacher analysis with weight sharing, including pooling and sparse connections, advancing theoretical insights into CNN generalisation.
Findings
Bounds depend on weight norms, not parameter count
Bounds are asymptotically tight near initialization
Incorporates weight sharing and pooling effects
Abstract
We show generalisation error bounds for deep learning with two main improvements over the state of the art. (1) Our bounds have no explicit dependence on the number of classes except for logarithmic factors. This holds even when formulating the bounds in terms of the -norm of the weight matrices, where previous bounds exhibit at least a square-root dependence on the number of classes. (2) We adapt the classic Rademacher analysis of DNNs to incorporate weight sharing -- a task of fundamental theoretical importance which was previously attempted only under very restrictive assumptions. In our results, each convolutional filter contributes only once to the bound, regardless of how many times it is applied. Further improvements exploiting pooling and sparse connections are provided. The presented bounds scale as the norms of the parameter matrices, rather than the number of parameters.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques
