DROMO: Distributionally Robust Offline Model-based Policy Optimization
Ruizhen Liu, Dazhi Zhong, Zhicong Chen

TL;DR
DROMO introduces a distributionally robust approach to offline model-based reinforcement learning, enhancing out-of-distribution policy safety and theoretical guarantees without relying solely on uncertainty estimates.
Contribution
It proposes a novel distributionally robust optimization framework for offline RL that extends regularization beyond uncertainty quantification, with theoretical analysis and compatibility with policy gradient methods.
Findings
Theoretically bounds the policy evaluation lower limit.
Can be integrated into existing policy gradient algorithms.
Provides analysis for both linear and non-linear models.
Abstract
We consider the problem of offline reinforcement learning with model-based control, whose goal is to learn a dynamics model from the experience replay and obtain a pessimism-oriented agent under the learned model. Current model-based constraint includes explicit uncertainty penalty and implicit conservative regularization that pushes Q-values of out-of-distribution state-action pairs down and the in-distribution up. While the uncertainty estimation, on which the former relies on, can be loosely calibrated for complex dynamics, the latter performs slightly better. To extend the basic idea of regularization without uncertainty quantification, we propose distributionally robust offline model-based policy optimization (DROMO), which leverages the ideas in distributionally robust optimization to penalize a broader range of out-of-distribution state-action pairs beyond the standard empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Electric Vehicles and Infrastructure
MethodsExperience Replay
