ViSA: Visited-State Augmentation for Generalized Goal-Space Contrastive Reinforcement Learning
Issa Nakamura, Tomoya Yamanokuchi, Yuki Kadokawa, Jia Qu, Shun Otsub, Ken Miyamoto, Shotaro Miwa, Takamitsu Matsubara

TL;DR
ViSA introduces a novel data augmentation technique for contrastive reinforcement learning that enhances goal-space generalization and improves value estimation for hard-to-visit states in robotic tasks.
Contribution
ViSA proposes a new visited-state augmentation method that generates augmented states and enforces consistent embedding space, improving goal generalization in contrastive RL.
Findings
Enhanced goal-space generalization in robotic tasks.
Improved value estimation for hard-to-visit goals.
Effective augmentation method demonstrated in simulation and real-world experiments.
Abstract
Goal-Conditioned Reinforcement Learning (GCRL) is a framework for learning a policy that can reach arbitrarily given goals. In particular, Contrastive Reinforcement Learning (CRL) provides a framework for policy updates using an approximation of the value function estimated via contrastive learning, achieving higher sample efficiency compared to conventional methods. However, since CRL treats the visited state as a pseudo-goal during learning, it can accurately estimate the value function only for limited goals. To address this issue, we propose a novel data augmentation approach for CRL called ViSA (Visited-State Augmentation). ViSA consists of two components: 1) generating augmented state samples, with the aim of augmenting hard-to-visit state samples during on-policy exploration, and 2) learning consistent embedding space, which uses an augmented state as auxiliary information to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Robot Manipulation and Learning
