Offline Reinforcement Learning with Wasserstein Regularization via Optimal Transport Maps
Motoki Omura, Yusuke Mukuta, Kazuki Ota, Takayuki Osa, Tatsuya Harada

TL;DR
This paper introduces a novel offline reinforcement learning method that uses Wasserstein distance and optimal transport maps via ICNNs to improve policy learning stability and performance without adversarial training.
Contribution
It proposes a Wasserstein regularization approach with ICNNs for optimal transport, avoiding adversarial training and enhancing offline RL stability and effectiveness.
Findings
Achieves comparable or better results than existing methods on D4RL benchmark.
Utilizes a discriminator-free approach for Wasserstein distance computation.
Demonstrates robustness to out-of-distribution actions.
Abstract
Offline reinforcement learning (RL) aims to learn an optimal policy from a static dataset, making it particularly valuable in scenarios where data collection is costly, such as robotics. A major challenge in offline RL is distributional shift, where the learned policy deviates from the dataset distribution, potentially leading to unreliable out-of-distribution actions. To mitigate this issue, regularization techniques have been employed. While many existing methods utilize density ratio-based measures, such as the -divergence, for regularization, we propose an approach that utilizes the Wasserstein distance, which is robust to out-of-distribution data and captures the similarity between actions. Our method employs input-convex neural networks (ICNNs) to model optimal transport maps, enabling the computation of the Wasserstein distance in a discriminator-free manner, thereby avoiding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Adversarial Robustness in Machine Learning
