Contextual Conservative Q-Learning for Offline Reinforcement Learning

Ke Jiang; Jiayu Yao; Xiaoyang Tan

arXiv:2301.01298·cs.LG·January 18, 2023

Contextual Conservative Q-Learning for Offline Reinforcement Learning

Ke Jiang, Jiayu Yao, Xiaoyang Tan

PDF

Open Access

TL;DR

This paper introduces Contextual Conservative Q-Learning (C-CQL), a method that enhances offline reinforcement learning by leveraging inverse dynamics models to improve policy robustness and reliability against out-of-distribution states.

Contribution

C-CQL is a novel approach that incorporates contextual information via inverse dynamics to reduce extrapolation error and improve policy stability in offline RL.

Findings

01

C-CQL achieves state-of-the-art results on offline Mujoco benchmarks.

02

C-CQL outperforms existing methods in noisy Mujoco settings.

03

Theoretical analysis shows C-CQL generalizes CQL and SDC.

Abstract

Offline reinforcement learning learns an effective policy on offline datasets without online interaction, and it attracts persistent research attention due to its potential of practical application. However, extrapolation error generated by distribution shift will still lead to the overestimation for those actions that transit to out-of-distribution(OOD) states, which degrades the reliability and robustness of the offline policy. In this paper, we propose Contextual Conservative Q-Learning(C-CQL) to learn a robustly reliable policy through the contextual information captured via an inverse dynamics model. With the supervision of the inverse dynamics model, it tends to learn a policy that generates stable transition at perturbed states, for the fact that pertuebed states are a common kind of OOD states. In this manner, we enable the learnt policy more likely to generate transition that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics