Thermodynamics of Reinforcement Learning Curricula
Jacob Adamczyk, Juan Sebastian Rojas, Rahul V. Kulkarni

TL;DR
This paper introduces a thermodynamic framework for reinforcement learning curricula, modeling reward parameters as a task manifold and deriving optimal curricula as geodesics to improve learning efficiency.
Contribution
It formalizes curriculum learning in RL using non-equilibrium thermodynamics and proposes the MEW algorithm for optimal temperature scheduling.
Findings
Optimal curricula are geodesics in task space.
The MEW algorithm provides a principled temperature annealing schedule.
Framework links thermodynamics with RL curriculum design.
Abstract
Connections between statistical mechanics and machine learning have repeatedly proven fruitful, providing insight into optimization, generalization, and representation learning. In this work, we follow this tradition by leveraging results from non-equilibrium thermodynamics to formalize curriculum learning in reinforcement learning (RL). In particular, we propose a geometric framework for RL by interpreting reward parameters as coordinates on a task manifold. We show that, by minimizing the excess thermodynamic work, optimal curricula correspond to geodesics in this task space. As an application of this framework, we provide an algorithm, "MEW" (Minimum Excess Work), to derive a principled schedule for temperature annealing in maximum-entropy RL.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Thermodynamics and Statistical Mechanics · Reinforcement Learning in Robotics · Statistical Mechanics and Entropy
