Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning
Yang Yue, Bingyi Kang, Zhongwen Xu, Gao Huang, Shuicheng Yan

TL;DR
This paper introduces VCR, a novel representation learning method that aligns imagined future states with real states through value prediction, significantly enhancing data efficiency in reinforcement learning.
Contribution
VCR directly optimizes state representations for decision-making by aligning value predictions of imagined and real states, a novel approach compared to traditional contrastive methods.
Findings
Achieves state-of-the-art results on Atari 100K benchmarks.
Improves sample efficiency in DeepMind Control Suite tasks.
Effective for both discrete and continuous action spaces.
Abstract
Deep reinforcement learning (RL) algorithms suffer severe performance degradation when the interaction data is scarce, which limits their real-world application. Recently, visual representation learning has been shown to be effective and promising for boosting sample efficiency in RL. These methods usually rely on contrastive learning and data augmentation to train a transition model for state prediction, which is different from how the model is used in RL--performing value-based planning. Accordingly, the learned representation by these visual methods may be good for recognition but not optimal for estimating state value and solving the decision problem. To address this issue, we propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making. More specifically, VCR trains a model to predict the future…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsALIGN · Contrastive Learning
