COMBO: Conservative Offline Model-Based Policy Optimization
Tianhe Yu, Aviral Kumar, Rafael Rafailov, Aravind Rajeswaran, Sergey, Levine, Chelsea Finn

TL;DR
COMBO introduces a conservative offline RL algorithm that regularizes the value function on out-of-support data without explicit uncertainty estimation, achieving tighter bounds and improved performance on benchmarks.
Contribution
The paper proposes COMBO, a novel offline RL method that enforces conservatism through value regularization, bypassing the need for explicit uncertainty quantification in complex models.
Findings
COMBO outperforms prior offline RL methods on standard benchmarks.
It provides a tighter lower bound on policy value than previous approaches.
The method is effective on image-based offline RL tasks.
Abstract
Model-based algorithms, which learn a dynamics model from logged experience and perform some sort of pessimistic planning under the learned model, have emerged as a promising paradigm for offline reinforcement learning (offline RL). However, practical variants of such model-based algorithms rely on explicit uncertainty quantification for incorporating pessimism. Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable. We overcome this limitation by developing a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-action tuples generated via rollouts under the learned model. This results in a conservative estimate of the value function for out-of-support state-action tuples, without requiring explicit uncertainty estimation. We theoretically show that our method optimizes a lower bound…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research
