KL-learning: Online solution of Kullback-Leibler control problems
Joris Bierkens, Bert Kappen

TL;DR
This paper presents a stochastic approximation algorithm for solving ergodic Kullback-Leibler control problems, enabling efficient solutions for Markov decision processes with KL-based costs, and demonstrates its effectiveness through numerical experiments.
Contribution
It introduces a novel stochastic approximation method for KL control problems, with theoretical analysis and practical performance comparable to existing algorithms.
Findings
Algorithm is comparable to power method and Z-learning in convergence speed
Provides a sound theoretical framework using the ODE method
Potential basis for reinforcement learning algorithms in MDPs
Abstract
We introduce a stochastic approximation method for the solution of an ergodic Kullback-Leibler control problem. A Kullback-Leibler control problem is a Markov decision process on a finite state space in which the control cost is proportional to a Kullback-Leibler divergence of the controlled transition probabilities with respect to the uncontrolled transition probabilities. The algorithm discussed in this work allows for a sound theoretical analysis using the ODE method. In a numerical experiment the algorithm is shown to be comparable to the power method and the related Z-learning algorithm in terms of convergence speed. It may be used as the basis of a reinforcement learning style algorithm for Markov decision problems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Smart Grid Energy Management
