Deep Learning as a Convex Paradigm of Computation: Minimizing Circuit Size with ResNets

Arthur Jacot

arXiv:2511.20888·stat.ML·March 26, 2026

Deep Learning as a Convex Paradigm of Computation: Minimizing Circuit Size with ResNets

Arthur Jacot

PDF

Open Access

TL;DR

This paper explores how deep neural networks, especially ResNets, can be viewed as convex optimization problems that minimize circuit complexity, providing a new theoretical perspective on their computational efficiency and success.

Contribution

It introduces a convex framework relating ResNet parameter norms to circuit size, offering a novel theoretical understanding of deep learning as minimal circuit computation.

Findings

01

ResNets relate to circuit size minimization via a convex norm.

02

A new HTMC norm on functions is introduced and connected to ResNet norms.

03

Minimizing ResNet norms approximates minimal circuit size within a power of two.

Abstract

This paper argues that DNNs implement a computational Occam's razor -- finding the `simplest' algorithm that fits the data -- and that this could explain their incredible and wide-ranging success over more traditional statistical methods. We start with the discovery that the set of real-valued function $f$ that can be $ϵ$ -approximated with a binary circuit of size at most $c ϵ^{- γ}$ becomes convex in the `Harder than Monte Carlo' (HTMC) regime, when $γ > 2$ , allowing for the definition of a HTMC norm on functions. In parallel one can define a complexity measure on the parameters of a ResNets (a weighted $ℓ_{1}$ norm of the parameters), which induce a `ResNet norm' on functions. The HTMC and ResNet norms can then be related by an almost matching sandwich bound. Thus minimizing this ResNet norm is equivalent to finding a circuit that fits the data with an almost…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications