Can We Really Learn One Representation to Optimize All Rewards?
Chongyi Zheng, Royina Karegoudra Jayanth, Benjamin Eysenbach

TL;DR
This paper analyzes forward-backward representation learning in reinforcement learning, clarifies its theoretical foundations, and proposes a simplified one-step method that improves zero-shot performance in various control tasks.
Contribution
The paper demystifies FB representation learning, clarifies its theoretical properties, and introduces a simplified one-step FB method that enhances zero-shot RL performance.
Findings
One-step FB converges to errors 10^5 times smaller.
Improves zero-shot performance by +24% on average.
Demonstrates effectiveness in 10 control domains.
Abstract
As machine learning has moved towards leveraging large models as priors for downstream tasks, the community has debated the right form of prior for solving reinforcement learning (RL) problems. If one were to try to prefetch as much computation as possible, they would attempt to learn a prior over the policies for some yet-to-be-determined reward function. Recent work (forward-backward (FB) representation learning) has tried this, arguing that an unsupervised representation learning procedure can enable optimal control over arbitrary rewards without further fine-tuning. However, FB's training objective and learning behavior remain mysterious. In this paper, we demystify FB by clarifying when such representations can exist, what its objective optimizes, and how it converges in practice. We draw connections with rank matching, fitted Q-evaluation, and contraction mapping. Our analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Robot Manipulation and Learning
