On-Policy Model Errors in Reinforcement Learning
Lukas P. Fr\"ohlich, Maksym Lefarov, Melanie N. Zeilinger, Felix, Berkenkamp

TL;DR
This paper introduces a hybrid reinforcement learning method that combines real-world data with a learned model, using on-policy corrections to reduce errors and improve stability in policy learning.
Contribution
The novel approach leverages on-policy data as correction terms on top of a learned model, effectively mitigating model errors during policy improvement.
Findings
Significantly improves model-based RL performance on MuJoCo and PyBullet benchmarks.
Reduces error accumulation in long-term predictions without extra tuning.
Theoretically justified method enhances stability and data efficiency.
Abstract
Model-free reinforcement learning algorithms can compute policy gradients given sampled environment transitions, but require large amounts of data. In contrast, model-based methods can use the learned model to generate new data, but model errors and bias can render learning unstable or suboptimal. In this paper, we present a novel method that combines real-world data and a learned model in order to get the best of both worlds. The core idea is to exploit the real-world data for on-policy predictions and use the learned model only to generalize to different actions. Specifically, we use the data as time-dependent on-policy correction terms on top of a learned model, to retain the ability to generate data without accumulating errors over long prediction horizons. We motivate this method theoretically and show that it counteracts an error term for model-based policy improvement.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Fuel Cells and Related Materials · Software Engineering Research
