Soft Forward-Backward Representations for Zero-shot Reinforcement Learning with General Utilities
Marco Bagatella, Thomas Rupf, Georg Martius, Andreas Krause

TL;DR
This paper introduces a soft forward-backward algorithm for zero-shot reinforcement learning that can optimize arbitrary differentiable utility functions, extending prior methods to more complex RL tasks using offline data.
Contribution
The paper develops a maximum entropy variant of the forward-backward algorithm capable of handling general utilities in zero-shot RL, enabling direct optimization without iterative procedures.
Findings
Retains zero-shot properties while handling complex utilities.
Effective in high-dimensional experiments with offline data.
Extends the applicability of FB algorithms to broader RL problems.
Abstract
Recent advancements in zero-shot reinforcement learning (RL) have facilitated the extraction of diverse behaviors from unlabeled, offline data sources. In particular, forward-backward algorithms (FB) can retrieve a family of policies that can approximately solve any standard RL problem (with additive rewards, linear in the occupancy measure), given sufficient capacity. While retaining zero-shot properties, we tackle the greater problem class of RL with general utilities, in which the objective is an arbitrary differentiable function of the occupancy measure. This setting is strictly more expressive, capturing tasks such as distribution matching or pure exploration, which may not be reduced to additive rewards. We show that this additional complexity can be captured by a novel, maximum entropy (soft) variant of the forward-backward algorithm, which recovers a family of stochastic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adaptive Dynamic Programming Control
