Why ReLU? A Bit-Model Dichotomy for Deep Network Training
Ilan Doron-Arad, Elchanan Mossel

TL;DR
This paper analyzes the computational complexity of training deep neural networks under realistic finite-precision models, revealing a sharp dichotomy where ReLU activations are computationally manageable, unlike polynomial activations.
Contribution
It establishes a complexity dichotomy showing polynomial activations are hard to train, while ReLU activations are computationally feasible under finite-precision constraints.
Findings
Training polynomial activation networks is P-hard.
Deciding a gradient bit in polynomial networks is P-hard.
ReLU networks have manageable complexity, being NP-complete.
Abstract
Theoretical analyses of Empirical Risk Minimization (ERM) are standardly framed within the Real-RAM model of computation. In this setting, training even simple neural networks is known to be -complete -- a complexity class believed to be harder than NP, that characterizes the difficulty of solving systems of polynomial inequalities over the real numbers. However, this algebraic framework diverges from the reality of digital computation with finite-precision hardware. In this work, we analyze the theoretical complexity of ERM under a realistic bit-level model (), where network parameters and inputs are constrained to be rational numbers with polynomially bounded bit-lengths. Under this model, we reveal a sharp dichotomy in tractability governed by the network's activation function. We prove that for deep networks with {\em any} polynomial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Cryptography and Data Security · Complexity and Algorithms in Graphs
