KL-learning: Online solution of Kullback-Leibler control problems

Joris Bierkens; Bert Kappen

arXiv:1112.1996·math.OC·February 17, 2012·2 cites

KL-learning: Online solution of Kullback-Leibler control problems

Joris Bierkens, Bert Kappen

PDF

Open Access

TL;DR

This paper presents a stochastic approximation algorithm for solving ergodic Kullback-Leibler control problems, enabling efficient solutions for Markov decision processes with KL-based costs, and demonstrates its effectiveness through numerical experiments.

Contribution

It introduces a novel stochastic approximation method for KL control problems, with theoretical analysis and practical performance comparable to existing algorithms.

Findings

01

Algorithm is comparable to power method and Z-learning in convergence speed

02

Provides a sound theoretical framework using the ODE method

03

Potential basis for reinforcement learning algorithms in MDPs

Abstract

We introduce a stochastic approximation method for the solution of an ergodic Kullback-Leibler control problem. A Kullback-Leibler control problem is a Markov decision process on a finite state space in which the control cost is proportional to a Kullback-Leibler divergence of the controlled transition probabilities with respect to the uncontrolled transition probabilities. The algorithm discussed in this work allows for a sound theoretical analysis using the ODE method. In a numerical experiment the algorithm is shown to be comparable to the power method and the related Z-learning algorithm in terms of convergence speed. It may be used as the basis of a reinforcement learning style algorithm for Markov decision problems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Smart Grid Energy Management