Reward-Weighted Regression Converges to a Global Optimum

Miroslav \v{S}trupl; Francesco Faccio; Dylan R. Ashley; Rupesh Kumar; Srivastava; J\"urgen Schmidhuber

arXiv:2107.09088·stat.ML·February 24, 2022

Reward-Weighted Regression Converges to a Global Optimum

Miroslav \v{S}trupl, Francesco Faccio, Dylan R. Ashley, Rupesh Kumar, Srivastava, J\"urgen Schmidhuber

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper proves that Reward-Weighted Regression (RWR) converges to a global optimum in a general setting without function approximation, and demonstrates R-linear convergence of the value function in finite spaces.

Contribution

It provides the first proof of global convergence for RWR and establishes R-linear convergence in finite state-action spaces.

Findings

01

RWR converges to a global optimum without function approximation.

02

In finite spaces, the state-value function converges R-linearly.

03

The convergence proof applies to a broad compact setting.

Abstract

Reward-Weighted Regression (RWR) belongs to a family of widely known iterative Reinforcement Learning algorithms based on the Expectation-Maximization framework. In this family, learning at each iteration consists of sampling a batch of trajectories using the current policy and fitting a new policy to maximize a return-weighted log-likelihood of actions. Although RWR is known to yield monotonic improvement of the policy under certain circumstances, whether and under which conditions RWR converges to the optimal policy have remained open questions. In this paper, we provide for the first time a proof that RWR converges to a global optimum when no function approximation is used, in a general compact setting. Furthermore, for the simpler case with finite state and action spaces we prove R-linear convergence of the state-value function to the optimum.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dylanashley/reward-weighted-regression
noneOfficial

Videos

Reward-Weighted Regression Converges to a Global Optimum· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Receptor Mechanisms and Signaling · Gene Regulatory Network Analysis