Adaptive Symmetric Reward Noising for Reinforcement Learning
Refael Vivanti, Talya D. Sohlberg-Baris, Shlomo Cohen, Orna Cohen

TL;DR
This paper introduces Adaptive Symmetric Reward Noising (ASRN), a novel method that adds Gaussian noise to rewards based on state variance to improve reinforcement learning stability and performance.
Contribution
The paper proposes ASRN, a new reward noising technique that mitigates variance-related brittleness in RL algorithms, demonstrated through bandit and autonomous driving experiments.
Findings
ASRN significantly improves Q-learning performance in variance-difference bandit problems.
ASRN enhances DQN training results in autonomous driving simulations.
Reward noising can paradoxically improve learning stability and outcomes.
Abstract
Recent reinforcement learning algorithms, though achieving impressive results in various fields, suffer from brittle training effects such as regression in results and high sensitivity to initialization and parameters. We claim that some of the brittleness stems from variance differences, i.e. when different environment areas - states and/or actions - have different rewards variance. This causes two problems: First, the "Boring Areas Trap" in algorithms such as Q-learning, where moving between areas depends on the current area variance, and getting out of a boring area is hard due to its low variance. Second, the "Manipulative Consultant" problem, when value-estimation functions used in DQN and Actor-Critic algorithms influence the agent to prefer boring areas, regardless of the mean rewards return, as they maximize estimation precision rather than rewards. This sheds a new light on how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Chaos control and synchronization
MethodsDense Connections · Convolution · Q-Learning · Deep Q-Network
