Policy Optimization in a Noisy Neighborhood: On Return Landscapes in   Continuous Control

Nate Rahn; Pierluca D'Oro; Harley Wiltzer; Pierre-Luc Bacon; Marc G.; Bellemare

arXiv:2309.14597·cs.LG·April 12, 2024

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

Nate Rahn, Pierluca D'Oro, Harley Wiltzer, Pierre-Luc Bacon, Marc G., Bellemare

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates the return landscape of continuous control policies in deep reinforcement learning, revealing noisy neighborhoods that cause instability, and proposes a distribution-aware method to improve policy robustness by navigating away from these regions.

Contribution

It introduces a distributional perspective on return landscapes, characterizes failure-prone regions, and develops a method to find stable paths improving policy robustness.

Findings

01

Return landscapes have noisy neighborhoods affecting stability

02

Simple paths can improve policy robustness

03

Distribution-aware navigation reduces policy failure

Abstract

Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy. To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nathanrahn/return-landscapes
jaxOfficial

Videos

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Auction Theory and Applications · Advanced Bandit Algorithms Research