Thermodynamically Optimal Regularization under Information-Geometric Constraints

Laurent Caraffa

arXiv:2601.17330·cs.LG·January 27, 2026

Thermodynamically Optimal Regularization under Information-Geometric Constraints

Laurent Caraffa

PDF

Open Access

TL;DR

This paper establishes a unifying geometric and thermodynamic framework for understanding regularization in machine learning, linking optimality, information geometry, and energy efficiency, and introduces new principles for designing regularization schemes.

Contribution

It provides a theoretical foundation connecting thermodynamic optimality and information geometry to regularization, deriving unique geometries and critiquing classical methods.

Findings

01

Fisher--Rao metric is the unique geometry for belief space.

02

Thermodynamically optimal regularization minimizes Fisher--Rao distance.

03

Classical regularization schemes are structurally incapable of ensuring thermodynamic optimality.

Abstract

Modern machine learning relies on a collection of empirically successful but theoretically heterogeneous regularization techniques, such as weight decay, dropout, and exponential moving averages. At the same time, the rapidly increasing energetic cost of training large models raises the question of whether learning algorithms approach any fundamental efficiency bound. In this work, we propose a unifying theoretical framework connecting thermodynamic optimality, information geometry, and regularization. Under three explicit assumptions -- (A1) that optimality requires an intrinsic, parametrization-invariant measure of information, (A2) that belief states are modeled by maximum-entropy distributions under known constraints, and (A3) that optimal processes are quasi-static -- we prove a conditional optimality theorem. Specifically, the Fisher--Rao metric is the unique admissible geometry…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Mechanics and Entropy · Gaussian Processes and Bayesian Inference · Stochastic Gradient Optimization Techniques