Can We Really Learn One Representation to Optimize All Rewards?

Chongyi Zheng; Royina Karegoudra Jayanth; Benjamin Eysenbach

arXiv:2602.11399·cs.LG·February 13, 2026

Can We Really Learn One Representation to Optimize All Rewards?

Chongyi Zheng, Royina Karegoudra Jayanth, Benjamin Eysenbach

PDF

Open Access

TL;DR

This paper analyzes forward-backward representation learning in reinforcement learning, clarifies its theoretical foundations, and proposes a simplified one-step method that improves zero-shot performance in various control tasks.

Contribution

The paper demystifies FB representation learning, clarifies its theoretical properties, and introduces a simplified one-step FB method that enhances zero-shot RL performance.

Findings

01

One-step FB converges to errors 10^5 times smaller.

02

Improves zero-shot performance by +24% on average.

03

Demonstrates effectiveness in 10 control domains.

Abstract

As machine learning has moved towards leveraging large models as priors for downstream tasks, the community has debated the right form of prior for solving reinforcement learning (RL) problems. If one were to try to prefetch as much computation as possible, they would attempt to learn a prior over the policies for some yet-to-be-determined reward function. Recent work (forward-backward (FB) representation learning) has tried this, arguing that an unsupervised representation learning procedure can enable optimal control over arbitrary rewards without further fine-tuning. However, FB's training objective and learning behavior remain mysterious. In this paper, we demystify FB by clarifying when such representations can exist, what its objective optimizes, and how it converges in practice. We draw connections with rank matching, fitted Q-evaluation, and contraction mapping. Our analysis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Robot Manipulation and Learning