Provable Generalization Bounds for Deep Neural Networks with Momentum-Adaptive Gradient Dropout
Adeel Safder

TL;DR
This paper introduces MAGDrop, a novel adaptive dropout method for deep neural networks that improves generalization by dynamically adjusting dropout rates based on gradients and momentum, supported by theoretical bounds and empirical results.
Contribution
We propose MAGDrop, a new momentum-adaptive dropout technique with a derived PAC-Bayes generalization bound, providing both theoretical justification and practical effectiveness.
Findings
MAGDrop achieves tighter generalization bounds than standard methods.
Empirical results show high accuracy on MNIST and CIFAR-10.
Theoretical bounds are validated with reproducible code.
Abstract
Deep neural networks (DNNs) achieve remarkable performance but often suffer from overfitting due to their high capacity. We introduce Momentum-Adaptive Gradient Dropout (MAGDrop), a novel regularization method that dynamically adjusts dropout rates on activations based on current gradients and accumulated momentum, enhancing stability in non-convex optimization landscapes. To theoretically justify MAGDrop's effectiveness, we derive a non-asymptotic, computable PAC-Bayes generalization bound that accounts for its adaptive nature, achieving up to 29.2\% tighter bounds compared to standard approaches by leveraging momentum-driven perturbation control. Empirically, the activation-based MAGDrop achieves competitive performance on MNIST (99.52\%) and CIFAR-10 (92.03\%), with generalization gaps of 0.48\% and 6.52\%, respectively. We provide fully reproducible code and numerical computation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning
