Why ReLU? A Bit-Model Dichotomy for Deep Network Training

Ilan Doron-Arad; Elchanan Mossel

arXiv:2602.19017·cs.LG·February 24, 2026

Why ReLU? A Bit-Model Dichotomy for Deep Network Training

Ilan Doron-Arad, Elchanan Mossel

PDF

Open Access

TL;DR

This paper analyzes the computational complexity of training deep neural networks under realistic finite-precision models, revealing a sharp dichotomy where ReLU activations are computationally manageable, unlike polynomial activations.

Contribution

It establishes a complexity dichotomy showing polynomial activations are hard to train, while ReLU activations are computationally feasible under finite-precision constraints.

Findings

01

Training polynomial activation networks is P-hard.

02

Deciding a gradient bit in polynomial networks is P-hard.

03

ReLU networks have manageable complexity, being NP-complete.

Abstract

Theoretical analyses of Empirical Risk Minimization (ERM) are standardly framed within the Real-RAM model of computation. In this setting, training even simple neural networks is known to be $\exists R$ -complete -- a complexity class believed to be harder than NP, that characterizes the difficulty of solving systems of polynomial inequalities over the real numbers. However, this algebraic framework diverges from the reality of digital computation with finite-precision hardware. In this work, we analyze the theoretical complexity of ERM under a realistic bit-level model ( $ERM_{bit}$ ), where network parameters and inputs are constrained to be rational numbers with polynomially bounded bit-lengths. Under this model, we reveal a sharp dichotomy in tractability governed by the network's activation function. We prove that for deep networks with {\em any} polynomial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Cryptography and Data Security · Complexity and Algorithms in Graphs