Upside-Down Reinforcement Learning Can Diverge in Stochastic   Environments With Episodic Resets

Miroslav \v{S}trupl; Francesco Faccio; Dylan R. Ashley; J\"urgen; Schmidhuber; Rupesh Kumar Srivastava

arXiv:2205.06595·stat.ML·May 16, 2022·1 cites

Upside-Down Reinforcement Learning Can Diverge in Stochastic Environments With Episodic Resets

Miroslav \v{S}trupl, Francesco Faccio, Dylan R. Ashley, J\"urgen, Schmidhuber, Rupesh Kumar Srivastava

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that Upside-Down Reinforcement Learning (UDRL), specifically the episodic variant eUDRL, can fail to converge to the optimal policy in certain stochastic environments with episodic resets, challenging previous assumptions.

Contribution

The paper provides a formal analysis showing the divergence of eUDRL in some stochastic environments and introduces a recursive policy update formulation for better understanding.

Findings

01

eUDRL can diverge in stochastic environments

02

A simple environment example demonstrates divergence

03

Recursive policy update helps analyze convergence issues

Abstract

Upside-Down Reinforcement Learning (UDRL) is an approach for solving RL problems that does not require value functions and uses only supervised learning, where the targets for given inputs in a dataset do not change over time. Ghosh et al. proved that Goal-Conditional Supervised Learning (GCSL) -- which can be viewed as a simplified version of UDRL -- optimizes a lower bound on goal-reaching performance. This raises expectations that such algorithms may enjoy guaranteed convergence to the optimal policy in arbitrary environments, similar to certain well-known traditional RL algorithms. Here we show that for a specific episodic UDRL algorithm (eUDRL, including GCSL), this is not the case, and give the causes of this limitation. To do so, we first introduce a helpful rewrite of eUDRL as a recursive policy update. This formulation helps to disprove its convergence to the optimal policy for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

struplm/udrl-gcsl-counterexample
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Advanced Multi-Objective Optimization Algorithms