Thermodynamically Optimal Regularization under Information-Geometric Constraints
Laurent Caraffa

TL;DR
This paper establishes a unifying geometric and thermodynamic framework for understanding regularization in machine learning, linking optimality, information geometry, and energy efficiency, and introduces new principles for designing regularization schemes.
Contribution
It provides a theoretical foundation connecting thermodynamic optimality and information geometry to regularization, deriving unique geometries and critiquing classical methods.
Findings
Fisher--Rao metric is the unique geometry for belief space.
Thermodynamically optimal regularization minimizes Fisher--Rao distance.
Classical regularization schemes are structurally incapable of ensuring thermodynamic optimality.
Abstract
Modern machine learning relies on a collection of empirically successful but theoretically heterogeneous regularization techniques, such as weight decay, dropout, and exponential moving averages. At the same time, the rapidly increasing energetic cost of training large models raises the question of whether learning algorithms approach any fundamental efficiency bound. In this work, we propose a unifying theoretical framework connecting thermodynamic optimality, information geometry, and regularization. Under three explicit assumptions -- (A1) that optimality requires an intrinsic, parametrization-invariant measure of information, (A2) that belief states are modeled by maximum-entropy distributions under known constraints, and (A3) that optimal processes are quasi-static -- we prove a conditional optimality theorem. Specifically, the Fisher--Rao metric is the unique admissible geometry…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Mechanics and Entropy · Gaussian Processes and Bayesian Inference · Stochastic Gradient Optimization Techniques
