Model-Augmented Q-learning

Youngmin Oh; Jinwoo Shin; Eunho Yang; Sung Ju Hwang

arXiv:2102.03866·cs.LG·February 9, 2021

Model-Augmented Q-learning

Youngmin Oh, Jinwoo Shin, Eunho Yang, Sung Ju Hwang

PDF

Open Access

TL;DR

This paper introduces Model-augmented Q-learning (MQL), a framework that combines model-based and model-free RL to improve value estimation and policy learning without extra training costs.

Contribution

It proposes a novel MFRL method that estimates Q-values, transition, and reward jointly, enhancing performance and convergence over existing methods.

Findings

01

MQL achieves better performance than state-of-the-art off-policy methods.

02

MQL converges faster and more reliably in experiments.

03

The approach is simple to implement without additional training overhead.

Abstract

In recent years, $Q$ -learning has become indispensable for model-free reinforcement learning (MFRL). However, it suffers from well-known problems such as under- and overestimation bias of the value, which may adversely affect the policy learning. To resolve this issue, we propose a MFRL framework that is augmented with the components of model-based RL. Specifically, we propose to estimate not only the $Q$ -values but also both the transition and the reward with a shared network. We further utilize the estimated reward from the model estimators for $Q$ -learning, which promotes interaction between the estimators. We show that the proposed scheme, called Model-augmented $Q$ -learning (MQL), obtains a policy-invariant solution which is identical to the solution obtained by learning with true reward. Finally, we also provide a trick to prioritize past experiences in the replay buffer by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Data Stream Mining Techniques