Regularization Guarantees Generalization in Bayesian Reinforcement   Learning through Algorithmic Stability

Aviv Tamar; Daniel Soudry; Ev Zisselman

arXiv:2109.11792·cs.LG·September 27, 2021

Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic Stability

Aviv Tamar, Daniel Soudry, Ev Zisselman

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that regularization in Bayesian reinforcement learning enhances policy stability and generalization, using algorithmic stability and quadratic growth conditions in non-convex MDPs.

Contribution

It introduces a novel stability analysis for regularized Bayesian RL policies without relying on convexity, leveraging recent convergence results for mirror descent in MDPs.

Findings

01

Regularization induces stability in Bayesian RL policies.

02

Stability leads to improved generalization guarantees.

03

Quadratic growth condition applies to regularized MDPs.

Abstract

In the Bayesian reinforcement learning (RL) setting, a prior distribution over the unknown problem parameters -- the rewards and transitions -- is assumed, and a policy that optimizes the (posterior) expected return is sought. A common approximation, which has been recently popularized as meta-RL, is to train the agent on a sample of $N$ problem instances from the prior, with the hope that for large enough $N$ , good generalization behavior to an unseen test instance will be obtained. In this work, we study generalization in Bayesian RL under the probably approximately correct (PAC) framework, using the method of algorithmic stability. Our main contribution is showing that by adding regularization, the optimal policy becomes stable in an appropriate sense. Most stability results in the literature build on strong convexity of the regularized loss -- an approach that is not suitable for RL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic Stability· underline

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Reinforcement Learning in Robotics

MethodsTest