Upside-Down Reinforcement Learning Can Diverge in Stochastic Environments With Episodic Resets
Miroslav \v{S}trupl, Francesco Faccio, Dylan R. Ashley, J\"urgen, Schmidhuber, Rupesh Kumar Srivastava

TL;DR
This paper demonstrates that Upside-Down Reinforcement Learning (UDRL), specifically the episodic variant eUDRL, can fail to converge to the optimal policy in certain stochastic environments with episodic resets, challenging previous assumptions.
Contribution
The paper provides a formal analysis showing the divergence of eUDRL in some stochastic environments and introduces a recursive policy update formulation for better understanding.
Findings
eUDRL can diverge in stochastic environments
A simple environment example demonstrates divergence
Recursive policy update helps analyze convergence issues
Abstract
Upside-Down Reinforcement Learning (UDRL) is an approach for solving RL problems that does not require value functions and uses only supervised learning, where the targets for given inputs in a dataset do not change over time. Ghosh et al. proved that Goal-Conditional Supervised Learning (GCSL) -- which can be viewed as a simplified version of UDRL -- optimizes a lower bound on goal-reaching performance. This raises expectations that such algorithms may enjoy guaranteed convergence to the optimal policy in arbitrary environments, similar to certain well-known traditional RL algorithms. Here we show that for a specific episodic UDRL algorithm (eUDRL, including GCSL), this is not the case, and give the causes of this limitation. To do so, we first introduce a helpful rewrite of eUDRL as a recursive policy update. This formulation helps to disprove its convergence to the optimal policy for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Advanced Multi-Objective Optimization Algorithms
