Unbiased Deep Reinforcement Learning: A General Training Framework for Existing and Future Algorithms
Huihui Zhang, Wu Huang

TL;DR
This paper introduces a new, unbiased training framework for deep reinforcement learning that improves sample efficiency and convergence, applicable to both existing and future algorithms across discrete and continuous tasks.
Contribution
The authors propose a general, unbiased training framework using Monte Carlo sampling and batch updates, enhancing efficiency and convergence in deep reinforcement learning.
Findings
Outperforms existing methods in sample efficiency and convergence rate.
Applicable to both discrete and continuous control problems.
Enables generalization of algorithms within the new framework.
Abstract
In recent years deep neural networks have been successfully applied to the domains of reinforcement learning \cite{bengio2009learning,krizhevsky2012imagenet,hinton2006reducing}. Deep reinforcement learning \cite{mnih2015human} is reported to have the advantage of learning effective policies directly from high-dimensional sensory inputs over traditional agents. However, within the scope of the literature, there is no fundamental change or improvement on the existing training framework. Here we propose a novel training framework that is conceptually comprehensible and potentially easy to be generalized to all feasible algorithms for reinforcement learning. We employ Monte-carlo sampling to achieve raw data inputs, and train them in batch to achieve Markov decision process sequences and synchronously update the network parameters instead of experience replay. This training framework proves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Adaptive Dynamic Programming Control
