Domain Generalization for Robust Model-Based Offline Reinforcement   Learning

Alan Clark; Shoaib Ahmed Siddiqui; Robert Kirk; Usman Anwar; Stephen; Chung; David Krueger

arXiv:2211.14827·cs.LG·November 29, 2022

Domain Generalization for Robust Model-Based Offline Reinforcement Learning

Alan Clark, Shoaib Ahmed Siddiqui, Robert Kirk, Usman Anwar, Stephen, Chung, David Krueger

PDF

Open Access

TL;DR

This paper introduces DIMORL, a domain generalization approach for multi-demonstrator offline RL, improving model robustness and policy performance by applying risk extrapolation to dynamics and reward models.

Contribution

It formulates multi-demonstrator offline RL as a domain generalization problem and applies REx to enhance model robustness across diverse demonstrator data.

Findings

01

Models trained with REx outperform baseline pooling methods.

02

REx-based models lead to more stable policy learning.

03

Potential for increased exploration in offline RL.

Abstract

Existing offline reinforcement learning (RL) algorithms typically assume that training data is either: 1) generated by a known policy, or 2) of entirely unknown origin. We consider multi-demonstrator offline RL, a middle ground where we know which demonstrators generated each dataset, but make no assumptions about the underlying policies of the demonstrators. This is the most natural setting when collecting data from multiple human operators, yet remains unexplored. Since different demonstrators induce different data distributions, we show that this can be naturally framed as a domain generalization problem, with each demonstrator corresponding to a different domain. Specifically, we propose Domain-Invariant Model-based Offline RL (DIMORL), where we apply Risk Extrapolation (REx) (Krueger et al., 2020) to the process of learning dynamics and rewards models. Our results show that models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning