
TL;DR
This paper establishes a theoretical link between neural network weight norms in fixed-precision regimes and Kolmogorov complexity, explaining why weight decay favors simpler, more compressible functions.
Contribution
It proves that the minimal weight norm of a fixed-precision neural network outputting a binary string equals the string's Kolmogorov complexity, up to a logarithmic factor.
Findings
Weight decay induces a prior matching Solomonoff's universal prior.
In fixed precision, all weight norms are proportional to the number of non-zero parameters.
The bounds are tight up to constants, with permutation encodings realizing the logarithmic factor.
Abstract
Why does weight decay work? We prove that, in any fixed-precision regime, the smallest weight norm of a looped neural network outputting a binary string equals the Kolmogorov complexity of that string, up to a logarithmic factor. This implies that weight decay induces a prior matching Solomonoff's universal prior, the optimal prior over computable functions, up to a polynomial factor. The result is norm-agnostic: in fixed precision, every weight norm collapses to the non-zero parameter count up to constants, so the same sandwich bound holds for any norm used as a regulariser. The proof has two short reductions: any program for a universal Turing machine can be encoded into neural weights at unit cost per program bit, and any fixed-precision network can be described by enumerating its non-zero parameters with logarithmic addressing overhead. Both bounds are tight up to constants, with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
