Is High Variance Unavoidable in RL? A Case Study in Continuous Control

Johan Bjorck; Carla P. Gomes; Kilian Q. Weinberger

arXiv:2110.11222·cs.LG·February 8, 2022

Is High Variance Unavoidable in RL? A Case Study in Continuous Control

Johan Bjorck, Carla P. Gomes, Kilian Q. Weinberger

PDF

Open Access 1 Video

TL;DR

This paper investigates the causes of high variance in reinforcement learning, especially in continuous control tasks, and demonstrates that simple architectural fixes like feature normalization can significantly reduce this variance.

Contribution

The study identifies numerical instability as a key cause of early variance in RL and shows that feature normalization effectively mitigates this issue, improving stability and reproducibility.

Findings

01

Variance mainly arises early in training due to numerical instability.

02

Normalizing penultimate features reduces outcome variance significantly.

03

Addressing instability allows for larger learning rates and more stable training.

Abstract

Reinforcement learning (RL) experiments have notoriously high variance, and minor details can have disproportionately large effects on measured outcomes. This is problematic for creating reproducible research and also serves as an obstacle for real-world applications, where safety and predictability are paramount. In this paper, we investigate causes for this perceived instability. To allow for an in-depth analysis, we focus on a specifically popular setup with high variance -- continuous control from pixels with an actor-critic agent. In this setting, we demonstrate that variance mostly arises early in training as a result of poor "outlier" runs, but that weight initialization and initial exploration are not to blame. We show that one cause for early variance is numerical instability which leads to saturating nonlinearities. We investigate several fixes to this issue and find that one…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Is High Variance Unavoidable in RL? A Case Study in Continuous Control· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Neural Networks and Reservoir Computing · Experimental Behavioral Economics Studies