Epistemically-guided forward-backward exploration

N\'uria Armengol Urp\'i; Marin Vlastelica; Georg Martius; Stelian Coros

arXiv:2507.05477·cs.LG·July 9, 2025

Epistemically-guided forward-backward exploration

N\'uria Armengol Urp\'i, Marin Vlastelica, Georg Martius, Stelian Coros

PDF

Open Access

TL;DR

This paper proposes an epistemically-guided exploration method for zero-shot reinforcement learning using forward-backward representations, significantly improving sample efficiency by minimizing epistemic uncertainty during exploration.

Contribution

It introduces a novel exploration strategy based on FB representations that reduces epistemic uncertainty, enhancing zero-shot RL performance.

Findings

01

Improved sample complexity over existing exploration methods.

02

FB-based exploration reduces epistemic uncertainty effectively.

03

Empirical results demonstrate significant efficiency gains.

Abstract

Zero-shot reinforcement learning is necessary for extracting optimal policies in absence of concrete rewards for fast adaptation to future problem settings. Forward-backward representations (FB) have emerged as a promising method for learning optimal policies in absence of rewards via a factorization of the policy occupancy measure. However, up until now, FB and many similar zero-shot reinforcement learning algorithms have been decoupled from the exploration problem, generally relying on other exploration algorithms for data collection. We argue that FB representations should fundamentally be used for exploration in order to learn more efficiently. With this goal in mind, we design exploration policies that arise naturally from the FB representation that minimize the posterior variance of the FB representation, hence minimizing its epistemic uncertainty. We empirically demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Domain Adaptation and Few-Shot Learning