Swapped goal-conditioned offline reinforcement learning
Wenyan Yang, Huiling Wang, Dingding Cai, Joni Pajarinen, Joni-Kristen, K\"am\"ar\"ainen

TL;DR
This paper introduces a goal-swapping technique and a new offline RL method, DQAPG, to improve generalization and performance in goal-conditioned tasks, especially in complex manipulation scenarios.
Contribution
The paper proposes a novel goal-swapping data augmentation method and the DQAPG algorithm, enhancing offline GCRL performance and robustness against noise and extrapolation errors.
Findings
DQAPG outperforms state-of-the-art methods on benchmark tasks.
Goal-swapping improves test results in goal-conditioned offline RL.
The method achieves success on complex in-hand manipulation tasks.
Abstract
Offline goal-conditioned reinforcement learning (GCRL) can be challenging due to overfitting to the given dataset. To generalize agents' skills outside the given dataset, we propose a goal-swapping procedure that generates additional trajectories. To alleviate the problem of noise and extrapolation errors, we present a general offline reinforcement learning method called deterministic Q-advantage policy gradient (DQAPG). In the experiments, DQAPG outperforms state-of-the-art goal-conditioned offline RL methods in a wide range of benchmark tasks, and goal-swapping further improves the test results. It is noteworthy, that the proposed method obtains good performance on the challenging dexterous in-hand manipulation tasks for which the prior methods failed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Machine Learning and Data Classification
MethodsTest
