Model-based Offline Policy Optimization with Adversarial Network
Junming Yang, Xingguo Chen, Shengyuan Wang, Bolei Zhang

TL;DR
This paper introduces MOAN, a model-based offline RL framework using adversarial learning to improve transition model generalization, uncertainty estimation, and exploration, leading to superior performance on benchmarks.
Contribution
Proposes MOAN, which employs adversarial learning for better generalization and uncertainty quantification in offline RL transition models, addressing over-conservatism and unreliable estimates.
Findings
Outperforms state-of-the-art offline RL baselines
Generates diverse in-distribution samples
Provides more accurate uncertainty quantification
Abstract
Model-based offline reinforcement learning (RL), which builds a supervised transition model with logging dataset to avoid costly interactions with the online environment, has been a promising approach for offline policy optimization. As the discrepancy between the logging data and online environment may result in a distributional shift problem, many prior works have studied how to build robust transition models conservatively and estimate the model uncertainty accurately. However, the over-conservatism can limit the exploration of the agent, and the uncertainty estimates may be unreliable. In this work, we propose a novel Model-based Offline policy optimization framework with Adversarial Network (MOAN). The key idea is to use adversarial learning to build a transition model with better generalization, where an adversary is introduced to distinguish between in-distribution and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Fuel Cells and Related Materials
