DROMO: Distributionally Robust Offline Model-based Policy Optimization

Ruizhen Liu; Dazhi Zhong; Zhicong Chen

arXiv:2109.07275·cs.LG·September 16, 2021

DROMO: Distributionally Robust Offline Model-based Policy Optimization

Ruizhen Liu, Dazhi Zhong, Zhicong Chen

PDF

Open Access

TL;DR

DROMO introduces a distributionally robust approach to offline model-based reinforcement learning, enhancing out-of-distribution policy safety and theoretical guarantees without relying solely on uncertainty estimates.

Contribution

It proposes a novel distributionally robust optimization framework for offline RL that extends regularization beyond uncertainty quantification, with theoretical analysis and compatibility with policy gradient methods.

Findings

01

Theoretically bounds the policy evaluation lower limit.

02

Can be integrated into existing policy gradient algorithms.

03

Provides analysis for both linear and non-linear models.

Abstract

We consider the problem of offline reinforcement learning with model-based control, whose goal is to learn a dynamics model from the experience replay and obtain a pessimism-oriented agent under the learned model. Current model-based constraint includes explicit uncertainty penalty and implicit conservative regularization that pushes Q-values of out-of-distribution state-action pairs down and the in-distribution up. While the uncertainty estimation, on which the former relies on, can be loosely calibrated for complex dynamics, the latter performs slightly better. To extend the basic idea of regularization without uncertainty quantification, we propose distributionally robust offline model-based policy optimization (DROMO), which leverages the ideas in distributionally robust optimization to penalize a broader range of out-of-distribution state-action pairs beyond the standard empirical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Electric Vehicles and Infrastructure

MethodsExperience Replay