A Depth Hierarchy for Computing the Maximum in ReLU Networks via Extremal Graph Theory
Itay Safran

TL;DR
This paper establishes a depth-dependent width lower bound for ReLU networks computing the maximum function, revealing inherent complexity linked to the geometric structure of non-differentiable regions.
Contribution
It introduces a novel combinatorial proof technique using extremal graph theory to derive the first super-linear lower bounds for the maximum function in deep ReLU networks.
Findings
Depth hierarchy for maximum computation in ReLU networks.
Super-linear width lower bounds at depths ≥3.
Graph-theoretic approach links non-differentiable ridges to cliques.
Abstract
We consider the problem of exact computation of the maximum function over real inputs using ReLU neural networks. We prove a depth hierarchy, wherein width is necessary to represent the maximum for any depth . This is the first unconditional super-linear lower bound for this fundamental operator at depths , and it holds even if the depth scales with . Our proof technique is based on a combinatorial argument and associates the non-differentiable ridges of the maximum with cliques in a graph induced by the first hidden layer of the computing network, utilizing Tur\'an's theorem from extremal graph theory to show that a sufficiently narrow network cannot capture the non-linearities of the maximum. This suggests that despite its simple nature, the maximum function possesses an inherent complexity that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Complexity and Algorithms in Graphs · Advanced Graph Neural Networks
