Neural Weight Norm = Kolmogorov Complexity

Tiberiu Musat

arXiv:2605.10878·cs.LG·May 12, 2026

Neural Weight Norm = Kolmogorov Complexity

Tiberiu Musat

PDF

TL;DR

This paper establishes a theoretical link between neural network weight norms in fixed-precision regimes and Kolmogorov complexity, explaining why weight decay favors simpler, more compressible functions.

Contribution

It proves that the minimal weight norm of a fixed-precision neural network outputting a binary string equals the string's Kolmogorov complexity, up to a logarithmic factor.

Findings

01

Weight decay induces a prior matching Solomonoff's universal prior.

02

In fixed precision, all weight norms are proportional to the number of non-zero parameters.

03

The bounds are tight up to constants, with permutation encodings realizing the logarithmic factor.

Abstract

Why does weight decay work? We prove that, in any fixed-precision regime, the smallest weight norm of a looped neural network outputting a binary string equals the Kolmogorov complexity of that string, up to a logarithmic factor. This implies that weight decay induces a prior matching Solomonoff's universal prior, the optimal prior over computable functions, up to a polynomial factor. The result is norm-agnostic: in fixed precision, every weight norm collapses to the non-zero parameter count up to constants, so the same sandwich bound holds for any norm used as a regulariser. The proof has two short reductions: any program for a universal Turing machine can be encoded into neural weights at unit cost per program bit, and any fixed-precision network can be described by enumerating its non-zero parameters with logarithmic addressing overhead. Both bounds are tight up to constants, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.