Provable Generalization Bounds for Deep Neural Networks with Momentum-Adaptive Gradient Dropout

Adeel Safder

arXiv:2510.18410·cs.LG·November 4, 2025

Provable Generalization Bounds for Deep Neural Networks with Momentum-Adaptive Gradient Dropout

Adeel Safder

PDF

Open Access

TL;DR

This paper introduces MAGDrop, a novel adaptive dropout method for deep neural networks that improves generalization by dynamically adjusting dropout rates based on gradients and momentum, supported by theoretical bounds and empirical results.

Contribution

We propose MAGDrop, a new momentum-adaptive dropout technique with a derived PAC-Bayes generalization bound, providing both theoretical justification and practical effectiveness.

Findings

01

MAGDrop achieves tighter generalization bounds than standard methods.

02

Empirical results show high accuracy on MNIST and CIFAR-10.

03

Theoretical bounds are validated with reproducible code.

Abstract

Deep neural networks (DNNs) achieve remarkable performance but often suffer from overfitting due to their high capacity. We introduce Momentum-Adaptive Gradient Dropout (MAGDrop), a novel regularization method that dynamically adjusts dropout rates on activations based on current gradients and accumulated momentum, enhancing stability in non-convex optimization landscapes. To theoretically justify MAGDrop's effectiveness, we derive a non-asymptotic, computable PAC-Bayes generalization bound that accounts for its adaptive nature, achieving up to 29.2\% tighter bounds compared to standard approaches by leveraging momentum-driven perturbation control. Empirically, the activation-based MAGDrop achieves competitive performance on MNIST (99.52\%) and CIFAR-10 (92.03\%), with generalization gaps of 0.48\% and 6.52\%, respectively. We provide fully reproducible code and numerical computation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning