Long N-step Surrogate Stage Reward to Reduce Variances of Deep   Reinforcement Learning in Complex Problems

Junmin Zhong; Ruofan Wu; Jennie Si

arXiv:2210.04820·cs.LG·March 21, 2024

Long N-step Surrogate Stage Reward to Reduce Variances of Deep Reinforcement Learning in Complex Problems

Junmin Zhong, Ruofan Wu, Jennie Si

PDF

Open Access

TL;DR

This paper introduces a long N-step surrogate stage (LNSS) reward method to reduce variance and improve performance in deep reinforcement learning for complex continuous control tasks, demonstrating systematic benefits over traditional methods.

Contribution

The paper proposes a novel LNSS reward approach that effectively handles complex environment dynamics and systematically improves RL performance in challenging benchmarks.

Findings

01

LNSS reduces variance exponentially compared to single-step methods.

02

LNSS improves total reward and convergence speed in benchmark environments.

03

LNSS demonstrates lower coefficient of variation (CV) in Q-value estimates.

Abstract

High variances in reinforcement learning have shown impeding successful convergence and hurting task performance. As reward signal plays an important role in learning behavior, multi-step methods have been considered to mitigate the problem, and are believed to be more effective than single step methods. However, there is a lack of comprehensive and systematic study on this important aspect to demonstrate the effectiveness of multi-step methods in solving highly complex continuous control problems. In this study, we introduce a new long $N$ -step surrogate stage (LNSS) reward approach to effectively account for complex environment dynamics while previous methods are usually feasible for limited number of steps. The LNSS method is simple, low computational cost, and applicable to value based or policy gradient reinforcement learning. We systematically evaluate LNSS in OpenAI Gym and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics