Planning with Expectation Models for Control
Katya Kudashkina, Yi Wan, Abhishek Naik, Richard S. Sutton

TL;DR
This paper extends model-based reinforcement learning to control tasks using expectation models, proving the necessity of value function updates and exploring strategies for policy improvement with new algorithms.
Contribution
It introduces the first general algorithms for MBRL with expectation models in control, demonstrating the need for value updates and analyzing different planning strategies.
Findings
Planning with expectation models requires value-function updates.
Three strategies for policy improvement are proposed and analyzed.
Algorithms are validated through computational experiments.
Abstract
In model-based reinforcement learning (MBRL), Wan et al. (2019) showed conditions under which the environment model could produce the expectation of the next feature vector rather than the full distribution, or a sample thereof, with no loss in planning performance. Such expectation models are of interest when the environment is stochastic and non-stationary, and the model is approximate, such as when it is learned using function approximation. In these cases a full distribution model may be impractical and a sample model may be either more expensive computationally or of high variance. Wan et al. considered only planning for prediction to evaluate a fixed policy. In this paper, we treat the control case - planning to improve and find a good approximate policy. We prove that planning with an expectation model must update a state-value function, not an action-value function as previously…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Data Stream Mining Techniques
