Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control
Nate Rahn, Pierluca D'Oro, Harley Wiltzer, Pierre-Luc Bacon, Marc G., Bellemare

TL;DR
This paper investigates the return landscape of continuous control policies in deep reinforcement learning, revealing noisy neighborhoods that cause instability, and proposes a distribution-aware method to improve policy robustness by navigating away from these regions.
Contribution
It introduces a distributional perspective on return landscapes, characterizes failure-prone regions, and develops a method to find stable paths improving policy robustness.
Findings
Return landscapes have noisy neighborhoods affecting stability
Simple paths can improve policy robustness
Distribution-aware navigation reduces policy failure
Abstract
Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy. To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Auction Theory and Applications · Advanced Bandit Algorithms Research
