Model-Augmented Q-learning
Youngmin Oh, Jinwoo Shin, Eunho Yang, Sung Ju Hwang

TL;DR
This paper introduces Model-augmented Q-learning (MQL), a framework that combines model-based and model-free RL to improve value estimation and policy learning without extra training costs.
Contribution
It proposes a novel MFRL method that estimates Q-values, transition, and reward jointly, enhancing performance and convergence over existing methods.
Findings
MQL achieves better performance than state-of-the-art off-policy methods.
MQL converges faster and more reliably in experiments.
The approach is simple to implement without additional training overhead.
Abstract
In recent years, -learning has become indispensable for model-free reinforcement learning (MFRL). However, it suffers from well-known problems such as under- and overestimation bias of the value, which may adversely affect the policy learning. To resolve this issue, we propose a MFRL framework that is augmented with the components of model-based RL. Specifically, we propose to estimate not only the -values but also both the transition and the reward with a shared network. We further utilize the estimated reward from the model estimators for -learning, which promotes interaction between the estimators. We show that the proposed scheme, called Model-augmented -learning (MQL), obtains a policy-invariant solution which is identical to the solution obtained by learning with true reward. Finally, we also provide a trick to prioritize past experiences in the replay buffer by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Data Stream Mining Techniques
