Value Gradient weighted Model-Based Reinforcement Learning
Claas Voelcker, Victor Liao, Animesh Garg, Amir-massoud, Farahmand

TL;DR
This paper introduces VaGraM, a novel value-gradient weighted model learning method that enhances model-based reinforcement learning by better accounting for value functions, leading to improved robustness and performance in complex environments.
Contribution
The paper proposes VaGraM, a new value-aware model learning approach that addresses limitations of existing methods, especially in small capacity models and high-dimensional states.
Findings
VaGraM achieves higher returns on Mujoco benchmarks.
It is more robust than maximum likelihood approaches.
The analysis highlights the importance of accounting for exploration and function approximation.
Abstract
Model-based reinforcement learning (MBRL) is a sample efficient technique to obtain control policies, yet unavoidable modeling errors often lead performance deterioration. The model in MBRL is often solely fitted to reconstruct dynamics, state observations in particular, while the impact of model error on the policy is not captured by the training objective. This leads to a mismatch between the intended goal of MBRL, enabling good policy and value learning, and the target of the loss function employed in practice, future state prediction. Naive intuition would suggest that value-aware model learning would fix this problem and, indeed, several solutions to this objective mismatch problem have been proposed based on theoretical analysis. However, they tend to be inferior in practice to commonly used maximum likelihood (MLE) based approaches. In this paper we propose the Value-gradient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
