Variance Control for Distributional Reinforcement Learning
Qi Kuang, Zhoufan Zhu, Liwen Zhang, Fan Zhou

TL;DR
This paper analyzes the error components in distributional reinforcement learning, introduces a new estimator QEM, and demonstrates its improved performance on benchmark tasks.
Contribution
It provides a theoretical error analysis, proposes the QEM estimator, and develops the QEMRL algorithm for better distributional RL performance.
Findings
QEMRL outperforms baseline algorithms in sample efficiency.
QEMRL shows improved convergence on Atari and Mujoco tasks.
Theoretical reduction of bias and variance in distributional RL.
Abstract
Although distributional reinforcement learning (DRL) has been widely examined in the past few years, very few studies investigate the validity of the obtained Q-function estimator in the distributional setting. To fully understand how the approximation errors of the Q-function affect the whole training process, we do some error analysis and theoretically show how to reduce both the bias and the variance of the error terms. With this new understanding, we construct a new estimator \emph{Quantiled Expansion Mean} (QEM) and introduce a new DRL algorithm (QEMRL) from the statistical perspective. We extensively evaluate our QEMRL algorithm on a variety of Atari and Mujoco benchmark tasks and demonstrate that QEMRL achieves significant improvement over baseline algorithms in terms of sample efficiency and convergence performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications
