Biased Estimates of Advantages over Path Ensembles

Lanxin Lei; Zhizhong Li; Dahua Lin

arXiv:1909.06851·cs.LG·September 17, 2019

Biased Estimates of Advantages over Path Ensembles

Lanxin Lei, Zhizhong Li, Dahua Lin

PDF

Open Access

TL;DR

This paper introduces a family of advantage estimation methods based on order statistics, demonstrating that biased estimates can improve exploration and performance in reinforcement learning across diverse environments.

Contribution

It proposes a novel family of advantage estimates using order statistics, systematically analyzes their impacts, and shows their effectiveness in various benchmarks.

Findings

01

Biased advantage estimates can enhance exploration in sparse reward environments.

02

Optimistic estimates improve learning efficiency in certain settings.

03

Conservative estimates are better when actions have critical impacts.

Abstract

The estimation of advantage is crucial for a number of reinforcement learning algorithms, as it directly influences the choices of future paths. In this work, we propose a family of estimates based on the order statistics over the path ensemble, which allows one to flexibly drive the learning process, towards or against risks. On top of this formulation, we systematically study the impacts of different methods for estimating advantages. Our findings reveal that biased estimates, when chosen appropriately, can result in significant benefits. In particular, for the environments with sparse rewards, optimistic estimates would lead to more efficient exploration of the policy space; while for those where individual actions can have critical impacts, conservative estimates are preferable. On various benchmarks, including MuJoCo continuous control, Terrain locomotion, Atari games, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Advanced Multi-Objective Optimization Algorithms