Improving Offline Reinforcement Learning with Inaccurate Simulators

Yiwen Hou; Haoyuan Sun; Jinming Ma; Feng Wu

arXiv:2405.04307·cs.RO·May 8, 2024

Improving Offline Reinforcement Learning with Inaccurate Simulators

Yiwen Hou, Haoyuan Sun, Jinming Ma, Feng Wu

PDF

Open Access

TL;DR

This paper introduces a method that combines offline datasets with data from inaccurate simulators using a GAN-based approach, improving offline RL performance in robotic tasks.

Contribution

It proposes a novel GAN-based technique to better integrate inaccurate simulation data with offline datasets for enhanced reinforcement learning.

Findings

01

Outperforms state-of-the-art methods on D4RL benchmark

02

Effective in real-world manipulation tasks

03

Utilizes GAN to reweight simulated data

Abstract

Offline reinforcement learning (RL) provides a promising approach to avoid costly online interaction with the real environment. However, the performance of offline RL highly depends on the quality of the datasets, which may cause extrapolation error in the learning process. In many robotic applications, an inaccurate simulator is often available. However, the data directly collected from the inaccurate simulator cannot be directly used in offline RL due to the well-known exploration-exploitation dilemma and the dynamic gap between inaccurate simulation and the real environment. To address these issues, we propose a novel approach to combine the offline dataset and the inaccurate simulation data in a better manner. Specifically, we pre-train a generative adversarial network (GAN) model to fit the state distribution of the offline dataset. Given this, we collect data from the inaccurate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics