Chaining Meets Chain Rule: Multilevel Entropic Regularization and Training of Neural Nets
Amir R. Asadi, Emmanuel Abbe

TL;DR
This paper introduces a novel multilevel entropic regularization approach for training neural networks, deriving new generalization bounds and proposing a chain rule-based training method with performance guarantees, demonstrated on MNIST.
Contribution
It develops a multilevel entropic regularization framework and a chain rule-based training procedure for neural nets, providing theoretical guarantees and an efficient sampling algorithm.
Findings
Derived generalization and excess risk bounds using multilevel relative entropy.
Proposed a multi-scale Gibbs distribution for neural network training.
Implemented a multilevel Metropolis algorithm tested on MNIST.
Abstract
We derive generalization and excess risk bounds for neural nets using a family of complexity measures based on a multilevel relative entropy. The bounds are obtained by introducing the notion of generated hierarchical coverings of neural nets and by using the technique of chaining mutual information introduced in Asadi et al. NeurIPS'18. The resulting bounds are algorithm-dependent and exploit the multilevel structure of neural nets. This, in turn, leads to an empirical risk minimization problem with a multilevel entropic regularization. The minimization problem is resolved by introducing a multi-scale generalization of the celebrated Gibbs posterior distribution, proving that the derived distribution achieves the unique minimum. This leads to a new training procedure for neural nets with performance guarantees, which exploits the chain rule of relative entropy rather than the chain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning
