A Study of the Mathematics of Deep Learning
Anirbit Mukherjee

TL;DR
This thesis advances the mathematical understanding of deep learning by establishing new theoretical results, algorithms, and bounds, thereby providing a rigorous foundation for neural network behavior and training methods.
Contribution
It introduces novel circuit complexity theorems, efficient training algorithms, convergence proofs for popular optimizers, and improved risk bounds for stochastic neural networks.
Findings
New circuit complexity theorems for neural functions
Linear-time training algorithm for ReLU gates
Convergence proofs for RMSProp and ADAM
Abstract
"Deep Learning"/"Deep Neural Nets" is a technological marvel that is now increasingly deployed at the cutting-edge of artificial intelligence tasks. This dramatic success of deep learning in the last few years has been hinged on an enormous amount of heuristics and it has turned out to be a serious mathematical challenge to be able to rigorously explain them. In this thesis, submitted to the Department of Applied Mathematics and Statistics, Johns Hopkins University we take several steps towards building strong theoretical foundations for these new paradigms of deep-learning. In chapter 2 we show new circuit complexity theorems for deep neural functions and prove classification theorems about these function spaces which in turn lead to exact algorithms for empirical risk minimization for depth 2 ReLU nets. We also motivate a measure of complexity of neural functions to constructively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications
MethodsRMSProp
