Biased Estimates of Advantages over Path Ensembles
Lanxin Lei, Zhizhong Li, Dahua Lin

TL;DR
This paper introduces a family of advantage estimation methods based on order statistics, demonstrating that biased estimates can improve exploration and performance in reinforcement learning across diverse environments.
Contribution
It proposes a novel family of advantage estimates using order statistics, systematically analyzes their impacts, and shows their effectiveness in various benchmarks.
Findings
Biased advantage estimates can enhance exploration in sparse reward environments.
Optimistic estimates improve learning efficiency in certain settings.
Conservative estimates are better when actions have critical impacts.
Abstract
The estimation of advantage is crucial for a number of reinforcement learning algorithms, as it directly influences the choices of future paths. In this work, we propose a family of estimates based on the order statistics over the path ensemble, which allows one to flexibly drive the learning process, towards or against risks. On top of this formulation, we systematically study the impacts of different methods for estimating advantages. Our findings reveal that biased estimates, when chosen appropriately, can result in significant benefits. In particular, for the environments with sparse rewards, optimistic estimates would lead to more efficient exploration of the policy space; while for those where individual actions can have critical impacts, conservative estimates are preferable. On various benchmarks, including MuJoCo continuous control, Terrain locomotion, Atari games, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Advanced Multi-Objective Optimization Algorithms
