Between Rate-Distortion Theory & Value Equivalence in Model-Based Reinforcement Learning
Dilip Arumugam, Benjamin Van Roy

TL;DR
This paper explores how rate-distortion theory can be applied to approximate value equivalence in model-based reinforcement learning, enabling near-optimal decision-making with simplified environment models under capacity constraints.
Contribution
It introduces an algorithm that synthesizes approximate environment models using rate-distortion principles, addressing the challenge of intractable environment complexity and limited agent capacity.
Findings
Proposes a formal framework for approximate value equivalence.
Demonstrates how rate-distortion theory guides environment model compression.
Shows potential for near-optimal policies with simplified models.
Abstract
The quintessential model-based reinforcement-learning agent iteratively refines its estimates or prior beliefs about the true underlying model of the environment. Recent empirical successes in model-based reinforcement learning with function approximation, however, eschew the true model in favor of a surrogate that, while ignoring various facets of the environment, still facilitates effective planning over behaviors. Recently formalized as the value equivalence principle, this algorithmic technique is perhaps unavoidable as real-world reinforcement learning demands consideration of a simple, computationally-bounded agent interacting with an overwhelmingly complex environment. In this work, we entertain an extreme scenario wherein some combination of immense environment complexity and limited agent capacity entirely precludes identifying an exactly value-equivalent model. In light of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
