De-randomized PAC-Bayes Margin Bounds: Applications to Non-convex and Non-smooth Predictors
Arindam Banerjee, Tiancong Chen, Yingxue Zhou

TL;DR
This paper introduces a novel de-randomized PAC-Bayes margin bound framework for deterministic non-smooth deep neural networks, such as ReLU nets, providing tighter generalization bounds by leveraging input-dependent smoothness and flatness.
Contribution
It develops a new de-randomization approach for non-convex, non-smooth predictors, enabling effective generalization bounds without relying on Lipschitz constants.
Findings
Bounds are tighter than traditional methods for ReLU nets.
Empirical results show bounds adapt to training set size and label randomness.
Framework applies to deterministic predictors, bridging a gap in PAC-Bayes theory.
Abstract
In spite of several notable efforts, explaining the generalization of deterministic non-smooth deep nets, e.g., ReLU-nets, has remained challenging. Existing approaches for deterministic non-smooth deep nets typically need to bound the Lipschitz constant of such deep nets but such bounds are quite large, may even increase with the training set size yielding vacuous generalization bounds. In this paper, we present a new family of de-randomized PAC-Bayes margin bounds for deterministic non-convex and non-smooth predictors, e.g., ReLU-nets. Unlike PAC-Bayes, which applies to Bayesian predictors, the de-randomized bounds apply to deterministic predictors like ReLU-nets. A specific instantiation of the bound depends on a trade-off between the (weighted) distance of the trained weights from the initialization and the effective curvature (`flatness') of the trained predictor. To get to these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning
