ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data
Tengyang Xie, Mohak Bhardwaj, Nan Jiang, Ching-An Cheng

TL;DR
ARMOR is a model-based offline reinforcement learning framework that guarantees no performance degradation of the baseline policy while improving it, using a robust relative pessimism approach to handle data uncertainty.
Contribution
ARMOR introduces a novel relative pessimism method for offline RL that ensures safe policy improvement regardless of data coverage limitations.
Findings
Guarantees no performance degradation of baseline policies.
Can learn to outperform the baseline within data coverage.
Robust to data uncertainty and supports real-world deployment.
Abstract
We propose a new model-based offline RL framework, called Adversarial Models for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary baseline policy regardless of data coverage. Based on the concept of relative pessimism, ARMOR is designed to optimize for the worst-case relative performance when facing uncertainty. In theory, we prove that the learned policy of ARMOR never degrades the performance of the baseline policy with any admissible hyperparameter, and can learn to compete with the best policy within data coverage when the hyperparameter is well tuned, and the baseline policy is supported by the data. Such a robust policy improvement property makes ARMOR especially suitable for building real-world learning systems, because in practice ensuring no performance degradation is imperative before considering any benefit learning can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Machine Learning and Data Classification
