TL;DR
This paper introduces a self-predictive representation learning method using successor representations to improve zero-shot combinatorial generalization in goal-conditioned behavior cloning.
Contribution
It proposes a novel representation learning objective, BYOL-γ, that encourages temporal consistency and approximates successor representations for better generalization.
Findings
Achieves competitive performance on tasks requiring combinatorial generalization.
The method reduces the out-of-distribution gap for novel state-goal pairs.
Demonstrates the effectiveness of temporal consistency in learned representations.
Abstract
While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally correlated states are properly encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. We formalize this notion by demonstrating how encouraging long-range temporal consistency via successor representations (SR) can facilitate generalization. We then propose a simple yet effective representation learning objective, for GCBC, which theoretically approximates the successor representation in the finite MDP case through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
