Variational Inference for Model-Free and Model-Based Reinforcement Learning
Felix Leibfried

TL;DR
This paper explores how variational inference (VI) can unify and enhance reinforcement learning (RL) methods, especially in model-based settings, by framing policy optimization and environment modeling as inference problems.
Contribution
It demonstrates the connection between VI and RL objectives, introducing a regularized VI framework that improves agent performance and clarifies inference in environment modeling.
Findings
VI recovers RL optimization objectives under soft policy constraints
Regularized VI improves agent performance in RL tasks
VI provides a natural framework for environment model learning in RL
Abstract
Variational inference (VI) is a specific type of approximate Bayesian inference that approximates an intractable posterior distribution with a tractable one. VI casts the inference problem as an optimization problem, more specifically, the goal is to maximize a lower bound of the logarithm of the marginal likelihood with respect to the parameters of the approximate posterior. Reinforcement learning (RL) on the other hand deals with autonomous agents and how to make them act optimally such as to maximize some notion of expected future cumulative reward. In the non-sequential setting where agents' actions do not have an impact on future states of the environment, RL is covered by contextual bandits and Bayesian optimization. In a proper sequential scenario, however, where agents' actions affect future states, instantaneous rewards need to be carefully traded off against potential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Advanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference
