Revisiting Model-based Value Expansion
Daniel Palenicek, Michael Lutter, Jan Peters

TL;DR
This paper empirically investigates why model-based value expansion methods underperform compared to simpler approaches, revealing the impact of model errors and providing insights for future improvements.
Contribution
It offers a comprehensive empirical analysis of the failure modes of value expansion, highlighting the role of model error and testing the theoretical limits of current methods.
Findings
Model errors significantly degrade value expansion performance.
True dynamics analysis reveals limitations of learned models.
Future research directions are suggested based on empirical maximum performance.
Abstract
Model-based value expansion methods promise to improve the quality of value function targets and, thereby, the effectiveness of value function learning. However, to date, these methods are being outperformed by Dyna-style algorithms with conceptually simpler 1-step value function targets. This shows that in practice, the theoretical justification of value expansion does not seem to hold. We provide a thorough empirical study to shed light on the causes of failure of value expansion methods in practice which is believed to be the compounding model error. By leveraging GPU based physics simulators, we are able to efficiently use the true dynamics for analysis inside the model-based reinforcement learning loop. Performing extensive comparisons between true and learned dynamics sheds light into this black box. This paper provides a better understanding of the actual problems in value…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Reinforcement Learning in Robotics · VLSI and FPGA Design Techniques
