Generalization in Reinforcement Learning with Selective Noise Injection   and Information Bottleneck

Maximilian Igl; Kamil Ciosek; Yingzhen Li; Sebastian Tschiatschek,; Cheng Zhang; Sam Devlin; Katja Hofmann

arXiv:1910.12911·cs.LG·October 30, 2019·58 cites

Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck

Maximilian Igl, Kamil Ciosek, Yingzhen Li, Sebastian Tschiatschek,, Cheng Zhang, Sam Devlin, Katja Hofmann

PDF

Open Access 1 Repo

TL;DR

This paper introduces Selective Noise Injection and combines it with the Information Bottleneck to improve the generalization ability of reinforcement learning policies, especially in low-data regimes, outperforming existing methods on benchmarks.

Contribution

It proposes a novel adaptation of regularization techniques for RL, specifically Selective Noise Injection, and demonstrates the effectiveness of combining it with the Information Bottleneck.

Findings

01

SNI maintains regularization benefits while improving gradient quality.

02

Combining IB with SNI outperforms state-of-the-art on Coinrun.

03

The approach enhances generalization in RL, especially early in training.

Abstract

The ability for policies to generalize to new environments is key to the broad application of RL agents. A promising approach to prevent an agent's policy from overfitting to a limited set of training environments is to apply regularization techniques originally developed for supervised learning. However, there are stark differences between supervised learning and RL. We discuss those differences and propose modifications to existing regularization techniques in order to better adapt them to RL. In particular, we focus on regularization techniques relying on the injection of noise into the learned function, a family that includes some of the most widely used approaches such as Dropout and Batch Normalization. To adapt them to RL, we propose Selective Noise Injection (SNI), which maintains the regularizing effect the injected noise has, while mitigating the adverse effects it has on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/IBAC-SNI
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Data Stream Mining Techniques

MethodsBatch Normalization · Dropout