State Distribution-aware Sampling for Deep Q-learning
Weichao Li, Fuxian Huang, Xi Li, Gang Pan, Fei Wu

TL;DR
This paper introduces a state distribution-aware sampling method for deep Q-learning that balances transition replay, reduces redundant updates, and improves learning efficiency and convergence speed.
Contribution
It proposes a novel sampling strategy that considers transition distribution and uncertainty, enhancing experience replay effectiveness in deep Q-learning.
Findings
Improved convergence speed in control and Atari tasks.
Reduced unnecessary TD updates, increasing learning efficiency.
Outperforms standard DQN in various benchmarks.
Abstract
A critical and challenging problem in reinforcement learning is how to learn the state-action value function from the experience replay buffer and simultaneously keep sample efficiency and faster convergence to a high quality solution. In prior works, transitions are uniformly sampled at random from the replay buffer or sampled based on their priority measured by temporal-difference (TD) error. However, these approaches do not fully take into consideration the intrinsic characteristics of transition distribution in the state space and could result in redundant and unnecessary TD updates, slowing down the convergence of the learning procedure. To overcome this problem, we propose a novel state distribution-aware sampling method to balance the replay times for transitions with skew distribution, which takes into account both the occurrence frequencies of transitions and the uncertainty of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural dynamics and brain function · Smart Grid Energy Management
MethodsExperience Replay
