Q-Value Weighted Regression: Reinforcement Learning with Limited Data
Piotr Kozakowski, {\L}ukasz Kaiser, Henryk Michalewski, Afroz, Mohiuddin, Katarzyna Ka\'nska

TL;DR
This paper introduces Q-Value Weighted Regression (QWR), a simple yet effective reinforcement learning algorithm that improves sample efficiency and performance in offline and high-dimensional settings, matching state-of-the-art results.
Contribution
QWR extends Advantage Weighted Regression by addressing its limitations, providing a more sample-efficient algorithm that performs well across continuous and discrete tasks, including offline RL.
Findings
QWR matches state-of-the-art algorithms on continuous control tasks.
QWR achieves comparable results to SAC on MuJoCo benchmarks.
QWR performs well in offline reinforcement learning settings.
Abstract
Sample efficiency and performance in the offline setting have emerged as significant challenges of deep reinforcement learning. We introduce Q-Value Weighted Regression (QWR), a simple RL algorithm that excels in these aspects. QWR is an extension of Advantage Weighted Regression (AWR), an off-policy actor-critic algorithm that performs very well on continuous control tasks, also in the offline setting, but has low sample efficiency and struggles with high-dimensional observation spaces. We perform an analysis of AWR that explains its shortcomings and use these insights to motivate QWR. We show experimentally that QWR matches the state-of-the-art algorithms both on tasks with continuous and discrete actions. In particular, QWR yields results on par with SAC on the MuJoCo suite and - with the same set of hyperparameters - yields results on par with a highly tuned Rainbow implementation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Adaptive Dynamic Programming Control
MethodsDilated Convolution · Global Average Pooling · Average Pooling · Convolution · 1x1 Convolution · Switchable Atrous Convolution
