Adaptive Symmetric Reward Noising for Reinforcement Learning

Refael Vivanti; Talya D. Sohlberg-Baris; Shlomo Cohen; Orna Cohen

arXiv:1905.10144·cs.LG·May 27, 2019·1 cites

Adaptive Symmetric Reward Noising for Reinforcement Learning

Refael Vivanti, Talya D. Sohlberg-Baris, Shlomo Cohen, Orna Cohen

PDF

Open Access 1 Repo

TL;DR

This paper introduces Adaptive Symmetric Reward Noising (ASRN), a novel method that adds Gaussian noise to rewards based on state variance to improve reinforcement learning stability and performance.

Contribution

The paper proposes ASRN, a new reward noising technique that mitigates variance-related brittleness in RL algorithms, demonstrated through bandit and autonomous driving experiments.

Findings

01

ASRN significantly improves Q-learning performance in variance-difference bandit problems.

02

ASRN enhances DQN training results in autonomous driving simulations.

03

Reward noising can paradoxically improve learning stability and outcomes.

Abstract

Recent reinforcement learning algorithms, though achieving impressive results in various fields, suffer from brittle training effects such as regression in results and high sensitivity to initialization and parameters. We claim that some of the brittleness stems from variance differences, i.e. when different environment areas - states and/or actions - have different rewards variance. This causes two problems: First, the "Boring Areas Trap" in algorithms such as Q-learning, where moving between areas depends on the current area variance, and getting out of a boring area is hard due to its low variance. Second, the "Manipulative Consultant" problem, when value-estimation functions used in DQN and Actor-Critic algorithms influence the agent to prefer boring areas, regardless of the mean rewards return, as they maximize estimation precision rather than rewards. This sheds a new light on how…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ManipulativeConsultant/AutonomousDrivingCookbook
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Chaos control and synchronization

MethodsDense Connections · Convolution · Q-Learning · Deep Q-Network