COSBO: Conservative Offline Simulation-Based Policy Optimization

Eshagh Kargar; Ville Kyrki

arXiv:2409.14412·cs.LG·September 24, 2024

COSBO: Conservative Offline Simulation-Based Policy Optimization

Eshagh Kargar, Ville Kyrki

PDF

Open Access

TL;DR

This paper introduces COSBO, a novel offline reinforcement learning method that combines imperfect simulation data with real environment data to improve policy training, outperforming existing approaches especially in complex dynamic scenarios.

Contribution

COSBO is the first method to effectively integrate simulation and real data for offline RL, addressing the sim-to-real gap and enhancing policy robustness.

Findings

01

Outperforms CQL, MOPO, and COMBO in diverse scenarios.

02

Demonstrates robustness across various experimental conditions.

03

Effectively leverages simulation data despite the sim-to-real gap.

Abstract

Offline reinforcement learning allows training reinforcement learning models on data from live deployments. However, it is limited to choosing the best combination of behaviors present in the training data. In contrast, simulation environments attempting to replicate the live environment can be used instead of the live data, yet this approach is limited by the simulation-to-reality gap, resulting in a bias. In an attempt to get the best of both worlds, we propose a method that combines an imperfect simulation environment with data from the target environment, to train an offline reinforcement learning policy. Our experiments demonstrate that the proposed method outperforms state-of-the-art approaches CQL, MOPO, and COMBO, especially in scenarios with diverse and challenging dynamics, and demonstrates robust behavior across a variety of experimental conditions. The results highlight that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications