Improving On-policy Learning with Statistical Reward Accumulation

Yubin Deng; Ke Yu; Dahua Lin; Xiaoou Tang; Chen Change Loy

arXiv:1809.02387·cs.LG·September 10, 2018

Improving On-policy Learning with Statistical Reward Accumulation

Yubin Deng, Ke Yu, Dahua Lin, Xiaoou Tang, Chen Change Loy

PDF

Open Access

TL;DR

This paper enhances on-policy reinforcement learning by incorporating statistical reward history and a new exploration method, significantly improving performance in sparse reward environments like Atari and MuJoCo.

Contribution

It introduces a novel approach combining reward statistics with multi-critic value functions and a new exploration mechanism called hot-wiring for better on-policy learning.

Findings

01

Improved performance in Atari and MuJoCo benchmarks.

02

Effective handling of sparse reward signals.

03

Enhanced value function approximation with multi-critics.

Abstract

Deep reinforcement learning has obtained significant breakthroughs in recent years. Most methods in deep-RL achieve good results via the maximization of the reward signal provided by the environment, typically in the form of discounted cumulative returns. Such reward signals represent the immediate feedback of a particular action performed by an agent. However, tasks with sparse reward signals are still challenging to on-policy methods. In this paper, we introduce an effective characterization of past reward statistics (which can be seen as long-term feedback signals) to supplement this immediate reward feedback. In particular, value functions are learned with multi-critics supervision, enabling complex value functions to be more easily approximated in on-policy learning, even when the reward signals are sparse. We also introduce a novel exploration mechanism called "hot-wiring" that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Artificial Intelligence in Games