Expert Q-learning: Deep Reinforcement Learning with Coarse State Values   from Offline Expert Examples

Li Meng; Anis Yazidi; Morten Goodwin; Paal Engelstad

arXiv:2106.14642·cs.LG·June 26, 2024

Expert Q-learning: Deep Reinforcement Learning with Coarse State Values from Offline Expert Examples

Li Meng, Anis Yazidi, Morten Goodwin, Paal Engelstad

PDF

TL;DR

This paper introduces Expert Q-learning, a deep reinforcement learning algorithm that integrates coarse state value assessments from offline experts to improve stability and performance, especially in non-deterministic environments.

Contribution

The paper presents a novel Expert Q-learning algorithm that incorporates offline expert assessments into deep Q-learning, enhancing robustness and reducing overestimation bias.

Findings

01

Expert Q-learning outperforms baseline Q-learning in Othello.

02

The new method shows increased stability in non-deterministic settings.

03

Expert Q-learning achieves higher scores than traditional algorithms.

Abstract

In this article, we propose a novel algorithm for deep reinforcement learning named Expert Q-learning. Expert Q-learning is inspired by Dueling Q-learning and aims at incorporating semi-supervised learning into reinforcement learning through splitting Q-values into state values and action advantages. We require that an offline expert assesses the value of a state in a coarse manner using three discrete values. An expert network is designed in addition to the Q-network, which updates each time following the regular offline minibatch update whenever the expert example buffer is not empty. Using the board game Othello, we compare our algorithm with the baseline Q-learning algorithm, which is a combination of Double Q-learning and Dueling Q-learning. Our results show that Expert Q-learning is indeed useful and more resistant to the overestimation bias. The baseline Q-learning algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsQ-Learning · Double Q-learning