Q-Value Weighted Regression: Reinforcement Learning with Limited Data

Piotr Kozakowski; {\L}ukasz Kaiser; Henryk Michalewski; Afroz; Mohiuddin; Katarzyna Ka\'nska

arXiv:2102.06782·cs.LG·February 16, 2021

Q-Value Weighted Regression: Reinforcement Learning with Limited Data

Piotr Kozakowski, {\L}ukasz Kaiser, Henryk Michalewski, Afroz, Mohiuddin, Katarzyna Ka\'nska

PDF

Open Access 1 Repo

TL;DR

This paper introduces Q-Value Weighted Regression (QWR), a simple yet effective reinforcement learning algorithm that improves sample efficiency and performance in offline and high-dimensional settings, matching state-of-the-art results.

Contribution

QWR extends Advantage Weighted Regression by addressing its limitations, providing a more sample-efficient algorithm that performs well across continuous and discrete tasks, including offline RL.

Findings

01

QWR matches state-of-the-art algorithms on continuous control tasks.

02

QWR achieves comparable results to SAC on MuJoCo benchmarks.

03

QWR performs well in offline reinforcement learning settings.

Abstract

Sample efficiency and performance in the offline setting have emerged as significant challenges of deep reinforcement learning. We introduce Q-Value Weighted Regression (QWR), a simple RL algorithm that excels in these aspects. QWR is an extension of Advantage Weighted Regression (AWR), an off-policy actor-critic algorithm that performs very well on continuous control tasks, also in the offline setting, but has low sample efficiency and struggles with high-dimensional observation spaces. We perform an analysis of AWR that explains its shortcomings and use these insights to motivate QWR. We show experimentally that QWR matches the state-of-the-art algorithms both on tasks with continuous and discrete actions. In particular, QWR yields results on par with SAC on the MuJoCo suite and - with the same set of hyperparameters - yields results on par with a highly tuned Rainbow implementation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vub-ai-lab/qwr
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Adaptive Dynamic Programming Control

MethodsDilated Convolution · Global Average Pooling · Average Pooling · Convolution · 1x1 Convolution · Switchable Atrous Convolution